Announcing crawler improvements for Live Search

Today we’re pleased to announce several improvements in the crawler for Live Search that should significantly improve the efficiency with which we crawl and index your web sites. We are always looking for ways to help webmasters, and we hope these features take us a few more steps in the right direction.

  • HTTP Compression: HTTP compression allows faster transmission time by compressing static files and application responses, reducing network load between your servers and our crawler. We support the most common compression methods: gzip and deflate as defined by RFC 2616 (see sections 14.11 and 14.39). Compression is currently supported by all major browsers and search engines. Use this online tool to check your server for HTTP compression support.

    The following links provide configuration information for IIS, and Apache.

  • Conditional Get: We support conditional get as defined by RFC 2616 (Section 14.25), generally we will not download the page unless it has changed since the last time we crawled it. As per the standard, our crawler will include the “If-Modified-Since” header & time of last download in the GET request and when available, our crawler will include the “If-None-Match” header and the ETag value in the GET request. If the content hasn’t changed the web server will respond with a 304 HTTP response.

    To check if your site already supports the “If-Modified-Since” HTTP header, you can use this online tool to check your server for HTTP Conditional Get support. Alternatively, you can check using Fiddler for Internet Explorer, or Live Headers for Firefox. Each of these tools allows you to create a custom GET request and send it to your server. You’ll want to make sure that your request includes the “If-Modified-Since” header like the following simplified sample:

    GET /sa/3_12_0_163076/webmaster/webmaster_layout.css HTTP/1.1
    Host: webmaster.live.com
    If-Modified-Since: Tue, 22 Jan 2008 01:28:49 GMT

    You should receive a server response similar to the following simplified sample:

    HTTP/1.x 304 Not Modified

    Check out MSDN for more information on using Fiddler for performance tuning.

    If you have not yet configured conditional get on your site, we would strongly encourage you to do so, as it can significantly help reduce server load as most browsers and crawlers already support this feature (e.g. IIS, Apache).

In addition to these two features there are many more improvements in performance that should help further optimize our crawling. As a result, we’ve also upgraded our user agent to reflect the changes, it is now “msnbot/1.1″. If you think you are experiencing any issues with MSNbot, or have any questions about the updates, please use our Crawler Feedback & Discussion form.

– Fabrice Canel, Live Search Crawling Team

Join the conversation

90 comments
  1. Anonymous

    I am curious how this new design affects the data usage. Are there numbers known what the reduction is in percentages?

  2. Anonymous

    How can the Conditional Get be done when your site is delivered dynamically by Apache/PHP using mod-rewrite functions for creating static URLs? From everything that I can tell, each page is new at the time of request. Is this the intended result that the new MSNbot 1.1 is looking for?

  3. Anonymous

    These improvements arrive a little bit to late but at least they’ve finally arrived. I wonder when true competition in the search market will arrive too! For instance, I again ask you guys at Live Search: when are you (for "you" I mean G, Y and M) going  to finally deliver a decent image search tool? Image search is a hundred steps behind text search in any engine. The one who delivers image-based (instead of text-based) image search will most certainly change the market rules… But I don’t see any SEO company or Search Engine Blog mention image  (and video, and music, and…) as of strategic importance whatsoever.

  4. Anonymous

    thank you very much for this article

  5. Anonymous

    Google and Orbitz both use gzip compression to deliver compressed versions of their pages to HTTP 1.1-compliant browsers.  Google.com has been compressing for a long time.  This improvement comes to late for Live Search! BTW, what’s the compression rate? Google’s typical savings on compressed text files range from 60% to 85%, depending on how redundant the code is.

  6. Anonymous

    Normally how long does a site or a newly website submitted will be indexed via the MSNBOT crawler? It might be an interesting topics which i wish to discuss on my website.

    Thanks

    David Cheong.

  7. Anonymous

    Thank you for your improvements!

  8. Anonymous

    Thanks so much for this! This is exactly what I was looking for.

  9. Anonymous

    Great job,

    But don’t get too comfortable now, there is still a lot that has to be improved.

  10. Anonymous

    Thanks for your guide in Conditional Get!

    Regards,

  11. Anonymous

    MSNBOT crawler is currently supported gzip! Thank you for your improvements!

  12. Anonymous

    I find that our site has a lot of page crawled by MSNBot and shows as having high ranks. However, actual searches does not show them. What could be the problem?

  13. Anonymous

    The latest Crawler Improvements for Live Search move is a welcome move and should offer a lot of utility and ease to the web master in their endeavour to improve their website performance on both counts

    First of improving the quality of their site thereby offering convinience and quality to the visitor.

    Second To improve the content of their web site by increasing feedback that they receive from Live.com

  14. Anonymous

    Thanks for your guide.

    Regard for you! :)

  15. Anonymous

    thank you very much for this article

  16. Anonymous

    Still a lot of improvement to be made, but this is definitely a step towards better engines running and better results for the users.

  17. Anonymous

    just trying to learn this stuff on my own

  18. Anonymous

    This is an excellent resource. Thanks

  19. Anonymous

    I made a office live web site..and submited it to all seach engines.. the only one that does not let me pull it up by the company name is yours ??? WHY is this since it is a microsolf site…..my company name is [ J. Orr Eq. Sales ] …if you can help me please do….my email is  [ joescarts@live.com ]

  20. Anonymous

    This is an excellent tool. cheers,

  21. Anonymous

    Excellent tool, I’m glad we have MSN LIVE!

  22. Anonymous

    I’m newbie …. so, I will learn more

  23. Anonymous

    Time to look up related items and try and work out if I can do it. thanks for the update

  24. Anonymous

    Thanks for the article and compression checker. We have seen considerable improvement on traffic by leveraging compression. Also look at mushit.com for image compression. Very simple and effective tool

  25. Anonymous

    Great tool. Really useful and adds a little bit extra to help us improve our sites.

  26. Anonymous

    I am trying to get my website listed on MSN and they have asked me to insert a code into my website header for verification.  I have done this and repeated it several times and they still keep saying the cannot find my site.  Any ideas would be welcome.  It is an office Live site (.aspx)

  27. Anonymous

    I don’t understand why a remote server said my site had an error???? The site is up and running

  28. Anonymous

    why can;t i find my web address

  29. Anonymous

    Laptop repair website with blog and forum

  30. Anonymous

    how this webmaster may enchanced my web

  31. Anonymous

    Is there a workaround if my host HTTP Compression is not enabled? Does it affect ranking if this features is not implemented on host?

    Thanks,

    Srednarb Group

  32. Anonymous

    I haven’t seen msnbot crawl yet my site

  33. Anonymous

    Great info. I am always trying to learn all I can about crawler bots to improve my seo. Thanks

  34. Anonymous

    This is good for me as webmaster, I’m looking good crawling….

  35. Anonymous

    it was really nice blog and more informatics about faster crawlers.

  36. Anonymous

    Hey man your crawler is not working … 1 month old url submission still not crawled.

  37. Anonymous

    This is very useful info for webmasters – at lease I got HTTP compression working now.

    Thank you

  38. Anonymous

    I’m always up for a little performance tuning for my WordPress based sites, or all sites for that matter, so this was a good little article to stumble across. Thanks.

  39. Anonymous

    my site is up and runing for last 6 month, i added it to live search 2 month ago, still live search does not recognize my site? !!!

    http://www.empireestateco.com

  40. Anonymous

    Dear webmasters, When blacking out my setup I find that some of  the text doe’s not appear on the buttons Q.Is this a flaw with Win.XP or am I missing something ?

  41. Anonymous

    modifications are welcome…

    But i always wonder why my site receives very less organic traffic from live as compared to Google?

    What is wrong with my site in view of live?

  42. Anonymous

    Hi

    Why my site very late to crawl by MSN Crawler ?

    Thanks

  43. Anonymous

    i cant find my homepage in live search

  44. Anonymous

    It only crawls 1 post on my blogs if even that. I have hundreds.

  45. Anonymous

    Thank you for the improvements. This are good news for our servers! Keep it going!

  46. Anonymous

    Great to hear that Live search team is doing something now. Hope my site: http://www.easytourchina.com will get reindexed soon. It is a 10-year-old site which disappeared from live search for a couple of years.

  47. Anonymous

    Live does seem to index my site but I have yet to be able to get it to verify my site….No such problem with Google and Yahoo

  48. Anonymous

    YOUR BUSINESS IS VERY GOOD I MUST CONFESS

  49. Anonymous

    Is it possible to implement HTTP compression for PHP displayed pages? I mean if the pages are pretty much static, only loaded via PHP, what command could I issue using, I would figure, the header() command, in order to provide HTTP compression from PHP?

    Victor

  50. Anonymous

    little msnbot in my site and i want to know why

    thank you for msn live seaech

  51. Anonymous

    thanks for this articles..it helps a lot..

  52. Anonymous

    I used to be on the MSN Live search, but for the last month I am not, I wonder what the problem is.

  53. Anonymous

    Does anyone know if sitemap.xml is supported by MSN for sure?

  54. Anonymous

    Thanks for this article. i using mod_deflate.

  55. Anonymous

    Yes sitemap.xml is now supported by MSN

  56. Anonymous

    What’s it exacly

    {

        URL generates an ETag value: "1edec-3e3073913b100"

    }

  57. Anonymous

    Thank you for the improvements. sitemap.xml is now supported by MSN

  58. Anonymous

    After read this article, I know that sitemap.xml support MSN. Thank a lot for this information

  59. Anonymous

    My site did’nt crawl to msn help me..

  60. Mr.noom

    Thanks. for your articles

  61. mnlpn

    getting 302 error…  don't understand why

  62. Anonymous

    This section of article was quite informative.. give a clear idea about the subject.. really it was very helpfull

    MSNBOT crawler is currently supported gzip — this is a good information too.. thanks

    Thanks again

    Sachin

  63. Quality Directory

    I will be pleased when Bing crawler and indexer improves like GoogleBot and Yahoo Slurp.

  64. carlosspaul

    Does anyone know if you can force "If-Modified-Since" in the header by using an .htaccess file as most people including myself can't afford a server ourselves with superfast broadband access and are forced to buy hosting? I'm using a wordpress blog. Thanks for any help in advance.

  65. modesto

    I see that many of you have had a problem with the Bot not indexing your site. One big problem with this may be that you are submitting the site through too many submission services. This is considered spamming. I never submit a site and all of my sites are indexed within a couple days.

  66. Anonymous

    Interesting article

  67. Anonymous

    thanks for this great article

  68. Anonymous

    well thanks for such a informative article

  69. Anonymous

    Thank you very much for this article.

    Regards

    James

  70. autotoolshop

    I am happy to know that. Bing crawled 25 pages of my website over the previous one year and I wish he can work harder.

  71. dailydemotivators

    Snap – I can't modifiy server options at blogspot :/

  72. dumri010

    how to submit sitemap on bing plz help me

  73. D3M-TEAM

    Thank You

  74. Anonymous

    msnbot/2.0b's compression seems to be broken and causing double requests:

    65.55.207.99 – - [26/Oct/2009:18:26:37 +0000] "GET / HTTP/1.1" 200 2917 "-" "msnbot/2.0b (+search.msn.com/msnbot.htm)"

    65.55.207.99 – - [26/Oct/2009:18:26:42 +0000] "GET / HTTP/1.0" 200 10327 "-" "msnbot/2.0b (+search.msn.com/msnbot.htm)"

    When compression is disabled for that User-Agent, it goes back to a single request via HTTP/1.1.

    Conclusion: msnbot is a broken piece of non-standards-compliant crud!

    For Apache admins who care enough to fix this instead of simply blocking msnbot entirely (either solution is equally good, IMO):

    BrowserMatch ^msnbot no-gzip

  75. Anonymous

    Thanks for share good post, useful information.

  76. radioalmeidense

    Muito Obrigado por melhorar o sistema de indexação.

  77. asdalorde

    مشكورين على هذا المقال

  78. expertmcth

    I try to referencing an official site http://www.expert-mcth.fr on bing but bing are only in english and i think for the french market that not corresponding of the french law and market that make 1 month and the crawler of bing aren't pass visiting my site for me bing have a lot of problem for referencing french site.

    For info expert mcth are one the official site of Afnic the french .fr registration on Legal entities

    ( http://www.afnic.fr/…/personnes-morales ) or in european commission trade (trade.ec.europa.eu/…/details.cfm)

    Where is the bing link for this site.

  79. sandit27

    Thank you for the improvements! This is great!

  80. metawing

    how i can veryfy my status

  81. Ideas Moderator

    Will these mean that Bing will be as fast as Yahoo and Google in terms of Indexing websites, especially blogs?

  82. loly1900

    MSNBOT crawler is currently supported gzip! Thank you for your improvements!

  83. lvproject

    yahoo more better than bing

  84. gordon.pryra

    Still stuck on a single page indexed.

    My site changes faster than Bing indexs it, which is a problem…

  85. drmarhendraputra

    the first question in my head is why bing only index 3 of all my pages?

  86. ibrahimarslan191

    Than you very much for this information. I hope understood clearly info.

Comments are closed.