Now that we have a beta, people are starting to pay attention to whether their sites are in our index. The two most common questions we get are (1) why did you not crawl my site, and (2) you crawled page X, but its not in your index why? Let’s take these one at a time.
Why did MSNBot not crawl my site? The answer to this is not straightforward so I will mention a couple of things that are worth considering. The first is to determine whether your pages are crawler friendly. An example of a page that might look “unfriendly” to a crawler is one that looks like this: http://www.somesite.com/info/default.aspx?view=22&tab=9&pcid=81-A4-76§ion=848&origin=msnsearch&cookie=false. When MSNBot looks at this URL it gets scared (well, not really it’s a machine not a human so it doesn’t have feelings). The algorithm starts to wonder whether it is going to get stuck in a loop endlessly crawling every single permutation of the query parameters. Thus, URL’s with many (definitely more than 5) query parameters have a very low chance of ever being crawled. Another thing to consider is whether we can find your page. If we need to traverse through eight pages on your site before finding leaf pages that nobody but yourself points to, MSNBot might choose not to go that far. This is why many people recommend creating a site map and we would as well. Lastly, you can also use this tool to Submit your URL to MSN Search.
Lastly, a brief moment on peanut butter — why is it that we stop liking peanut butter after like 8th grade? Or is it just me? I have not had a peanut butter and jelly sandwich for the longest time. This morning I had one. Yummy. Here’s to peanut butter.
Eytan Seidman, Program Manager