Optimizing your very large site for search — Part 2

For the large website, there are many critically important issues in optimizing for search. In Part 1 of this series of posts, we discussed the importance of reducing the number of URLs you expose through canonicalization. But there are other ways to reduce the surface area of your site to search engines and focus on pages that matter.

While you may have reduced the number of URLs you exposed to Live Search, a large site can still have a large surface area to crawl. In crawling your site, search engines may not get all the best content or can eat unnecessary bandwidth that you pay for. This is where HTTP compression and conditional GET can help.

 

Enabling HTTP compression

 

Whether or not you are concerned with bandwidth control, setting up HTTP compression is a best practice for every site owner. What is HTTP compression? HTTP compression is a protocol that is a part of the HTTP 1.1 specification standard known as “content-encoding.” This protocol defines how a web server can check, when it receives a request for a file, if the client browser (or crawler) is “compression enabled” before serving the file to the client.

Most people are familiar with the ZIP file format of data compression where files are added to a ZIP archive and then extracted as needed. This is not how HTTP compression works. HTTP compression is used by web servers to passively compress document files in real time as they are being transferred to the browser. The browser is able to decompress and display the file as intended.

 

Not all files are created equal

 

Certain file types are not suitable for HTTP compression. For example, files that have already been compressed, such as JPEGs, GIFs, movies, or standard compressed files (e.g. ZIP, gzip, and .RAR) are not going to compress further with HTTP compression turned on. However, sites that have a lot of plain text content, including the main HTML files, XML, CSS, and RSS, will benefit from HTTP compression. For example, most standard HTML text files will compress by about a half, sometimes more.

 

Setting up Apache

 

If your site is running Apache, you can leverage the mod_deflate tool, which will add a filter to compress the content as a gzip file. You can apply these filters site-wide or by selectively compressing only specific MIME types, determined by examining the header generated, either automatically by Apache or a CGI script or some other dynamic programming you create.

To enable compression for all MIME types, set the SetOutputFilter directive to a website or directory:

<Directory "/web/mysite/php/"> 
    SetOutputFilter Deflate 
</Directory>

To enable compression on a specific MIME type (as in this example, “text/html”), use the AddOutputFilterByType directive:

AddOutputFilterByType DEFLATE text/html 

Every site is different. If you need to support older browsers, you may need to have more advanced configurations. You can read more in the mod_deflate documentation.

 

Setting up IIS 7

 

Fortunately for most site owners, IIS 7 has HTTP compression for static files enabled by default. However, if you want to compress all files, you have to manually turn on Dynamic Compression.

You can do this by going to the IIS Services panel and double-clicking Compression.

  IIS7-1

You’ll notice Enable static content compression is selected by default. To enable Dynamic Compression, simply select the Enable dynamic content compression option.

IIS7-2

 

Setting up IIS 6

 

IIS 6 also includes a native compression system and can be configured to compress both static and dynamic content. To enable HTTP compression in IIS 6, all you have to do is open the website’s Properties page and edit the global properties for the site. Under the Service tab, you can configure the options within the HTTP compression section.

 IIS6-1

Both versions of IIS also cache the compressed information in a directory, which helps improve the performance by eliminating the need to re-compress files on the fly. As with Apache, IIS does let you select MIME types to compress. TechNet has more information about selective compression in IIS.

 

Did your content change since last time we visited it?

 

As with HTTP compression, the official HTTP 1.1 specification allows you to define when a document was last updated. When Live Search crawls your site, we ask if each document has changed since we last looked at it. If so, then give us the latest version. Otherwise, if it is unchanged, just let us know and give us nothing. This mechanism is referred to as conditional GET, and by implementing it, you can save yourself bandwidth and us the cost of comparing files we already have in the index.

Additionally, it allows us to spend our crawl time looking at files that we may not have previously indexed, which could improve your coverage over time. The following chart demonstrates the potential gain in coverage with conditional GET versus a site without conditional GET.

 Chart

 

Implementing the conditional statements

 

There are a lot of factors to consider when implementing conditional GET, depending on the web server, programming language, or content management system used. Fortunately, both IIS and Apache have native support for Last-Modified / If-Modified-Since / Not-Modified functionality for static files. For dynamic files, you may need to implement a code-based solution. A good pattern for this and an equally good description of how the crawler responds to conditional GET can be found in the article, “Save bandwidth costs: Dynamic pages can support If-Modified-Since too.“

 

Testing your setup

 

Once you have determined the best course for implementing your HTTP compression and conditional GET strategies, you can ensure that your implementation is working with our HTTP Compression and HTTP Conditional Get test tool. Using your robots.txt file, you can test to ensure your configuration is correct and will work with Live Search.

 

Coming up next

 

Now that your pages are compressed and you are telling us when your content is new, we will move on to discussing how to avoid hiding the content you want us to find. As always, if you have additional questions, feel free to ask in our forums.

Jeremiah Andrick — Program Manager, Live Search Webmaster

Join the conversation

31 comments
  1. Anonymous

    Could you please provide that graph under a permissive license such as Creative Commons, so that we can use it elsewhere to promote site owners implementing this?

  2. Anonymous

    If the website is mainly build base on flash contents then the HTTP compression doesn’t have any significant reduction in size right?

  3. rickdej

    @all Thanks for the comments

    @smokie .swf files are already compressed and HTTP compression will offer no additional value.   But there are other issues with flash. Be sure to check out Part 3 in the series.

    Jeremiah Andrick

  4. Anonymous

    How to enable Enabling HTTP compression for web site build in Yahoo Stores.

  5. Anonymous

    OK, good job with the info.

    B

  6. Anonymous

    Whats Your Comment on Shared Hosting Environment. (Approx 92% uses)

    This Said Article will not work their. as of no control over.

    Could you guys provide any info over the subject.

    Alok Tiwari

    India

  7. Anonymous

    I would like to ask you a question. How the site optimization is different for live search engines and Google search engines.

    Don’t the search engines will take care of our website automatically, if this is good?

  8. Anonymous

    Do you have instructions for the Cpanel?

  9. Anonymous

    How can my website get rank in MSN?

  10. Anonymous

    MSN rank does not evaluate as much as Google does! forget msn ranking rush to get google ranking automatically other search engine ranking will extend.

    Best SEO practice at Google webmasters!

  11. Quality Directory

    I'm very concerned about bandwidth control, and I've set up HTTP compression.

  12. markamoment

    How to enable Enabling HTTP compression for web site build on FREEWebs?

  13. longsapa

    This is so useful information

    Many thanks

  14. LeonTheGreat

    What would you do for a dynamic website that uses ASP.NET? I don't have direct access to the server configuration either – is there a way to set this up on the client-side?

  15. angelvoyagera

    HTTP Conditional Get not enabled (Live.com)

    HTTP status code: 304 Not Modified

    HTTP conditional GET: enabled

    URL generates an ETag value: 4a2770f0-697e-530a1c00

    <b>HTTP compression: not enabled (HTTP compression can be enabled for non-304 URLs)</b>

    HTTP headers:

       Connection: close

       Vary: Accept-Encoding

       Cache-Control: max-age=86254, public

       Date: Fri, 10 Jul 2009 14:51:37 GMT

       Expires: Sat, 11 Jul 2009 14:49:12 GMT

       ETag: 4a2770f0-697e-530a1c00

       Server: Apache/2.2.3 (CentOS)

    can someone to help me

    thanks.

  16. get backlinks

    nice tutorial thanks

  17. insain4

    How much bandwidth can be saved by implementing these options?

  18. Anonymous

    Welcome to ours web site ;-))!!!

  19. TheNutri.com

    See, I updated the HTTP compression on iis but HTTP Compression tool is still giving me the following:

    HTTP compression: not enabled

    Whats the issue?

    my site url is http://thenutri.com

  20. miles2go

    Bing nice work continue's.

  21. allsportssunglasses

    what is this and how will it help my blog an web site.

       Thanks

  22. Anonymous

    very thanks for article!

  23. wendyswalkers

    I have the same problem. I need help! Bing in the last 4 months has only 31 pages up.

    URL: http://www.wendyswalkers.com

    HTTP status code: 200 OK

    HTTP conditional GET: not enabled for the date selected

    HTTP compression: not enabled

    HTTP headers:

       Content-Length: 68594

       Content-Type: text/html

       Date: Sat, 05 Sep 2009 00:40:47 GMT

       Set-Cookie: vsettings=; expires=Mon, 30-Aug-2010 07:00:00 GMT; path=/,ASPSESSIONIDQQRRBTCQ=HFGNHHHALOCNDMDDFFJFLAFM; path=/

       Server: Microsoft-IIS/6.0

       X-Powered-By: ASP.NET

       Cache-control: private

  24. abbaszn

    hi

    How to Enabling HTTP compression for my website?

    thanks.

  25. djachao

    Hi,

    how can I enable this on a share account, I don't have access to the root server to have this enable there, my site is hosted on dreamhost and after setting my sitemap and everything, bing only index one sindle page, only 1 page of two of my sites. I'm really lost, any help will be greatly appreciated: my sites are http://www.mcquadr.at , http://www.mcsquare.me + subdomains.

    Thanks

  26. businessbyweb

    thanksss very much

  27. CarnalRay

    mcsquare, you can compress on a shared server, you need a .htaccess file (should be on the in the root of your website)

  28. hughww

    Thanks – good information

  29. wickfree98275

    I dont get it:( Im gonna have to read more

  30. tech.valyoo

    We have enabled http compression but still page load time is high. please suggest some other factors for low page time.

    Thanks

Comments are closed.