Bing and cloaking detection

Since the inception of the Bing Search team a few years ago, we’ve been maniacally focused on one thing: relevance. We use the term relevance to mean that Bing finds the answer to your question better than any other source – whether your question is best answered by a website, or a real-time traffic map. And with our recent Fall update our relevancy has improved dramatically – to the point where we think we’ve got the best product in some areas, and a highly competitive product in others. (Try it out, I dare youimages, maps, mobile, web)

One of the biggest challenges with relevancy is how to distinguish legitimate information from various forms of search spam. This is one area that we’ve made especially good progress in over the last 8 months through a suite of tools that helps us detect, evaluate and manage spam. One of these tools is an extension to BingBot, giving us an additional way to detect cloaking. (It should be noted that not all cloaking is spam related and we do our best to take this into account, however, we still don’t recommend cloaking in any situation).

The goal of the tool was simple, however there have been some well-documented short comings in our implementation that have impacted the reporting metrics of some websites. We have been listening to the feedback over the past couple months and continuing to optimize the tool to eliminate these issues:

  • AdSense/Overture reporting – Initially there was a bug in our crawler that caused it to download all content on your page, including ad blocks. We have since fixed this issue by blocking requests to Google and Overture to preserve the integrity of your reporting.
  • Distort site statistics with unfilterable bot traffic – Webmasters have also reported a high level of traffic coming from this bot, in some cases high enough to impact their logs in a statistically significant way. We have been continuing to optimize the crawler and most webmasters should notice the referrer traffic dropping to almost nothing over the next month.
  • Pollute HTTP logs with inappropriate terms – Another unfortunate issue is that we were using a common list of keywords for our testing that was not site specific. We have tuned this list and you should no longer see any keywords used that are not related to the content of your site.
  • Microsoft isn’t responding to questions –  Webmasters who encountered these problems and reported them to Microsoft have not been able to get a satisfactory or timely response. We have created a forum specifically to answer your questions and comments. For sensitive issues, please use our feedback form to contact us privately.

Hopefully webmasters have also noticed these issues disappearing. If you are still experiencing any issues, please contact us before you block BingBot, to see if we can address the issue.

We’re inspired by the relevancy improvements we’ve made this Fall, and have much more in the works for Spring. Please keep us in mind as you do your searching, and let us know how you think we can make search better.

    Glad to see that LIVE i s doing more for the webmasters community :)

    I know what you were doing stirred up a hornets nest, no one really knew what was going on. I was not really concerned the bandwidth used was minimal and it was pretty obvious in my log files this was not bogus traffic as some contend to boost Live Search’s search share?!

    There are too many MS haters out there to get an honest picture from a blog or two.

    Keep on doing what you are doing. I showed my uncle who is a doctor and medical director at a major Chicago hospital and rehab. center how spot on Live Search is compared to the others regarding health issues, symptoms and diagnosis; he instructed all residents and nursing staff to use Live Search.

    What you have done in such a short period is amazing and I suggest to everyone, when Google gives you a forum entry from 2001 as a top search result, is much more up to date and relevant than any other.

    My only complaint is wikkipedia, why when you can return much more relevant and credible results from Encarta do you put these at the top?? If most people are looking for an article or document, there are much more reliable resources.

    Besides, by now everyone on the planet knows how to find wikki search and wikkipedia. Why include search results in your search results??

    No one’s perfect but you guys are getting close.

    Keep up the good work!

    Dann404, I’m not a MS hater, I don’t do so called black hat SEO, but my crap-o-meter tells me that this method is, well, um, questionable to put it politely. There’s no need for a huge red herring (aka mass referrer spam) to obfuscate the not that obvious cloaking detection. If that’s the sole method applied to catch cloakers, bye bye search quality.

    I agree with SebastianX, more needs to be done in order to detect cloaking.

    Live Search has improved tremendously ver the last littel while, but it still has quarks to iron out.

    Hi Nathan,

    I think it’s great that you communicate honestly and clearly about what you’re doing. Nothing but praise for that! I also think that although the early implementation was not so great and easy to detect, I can see where this is going and I like it. When the referrer becomes indistinguishable from real traffic and you switch IPs often enough this could be a real good technique to catch those cloaked pages. This is good news for honest webmasters. Keep up the good work!


  6. rickdej

    Thanks for all your comments.   I appreciate that you are taking the time to come here and read the posts.  

    We hope to use this blog to offer direct feedback and articles effecting webmasters and how they interact with Live Search.


    Jeremiah Andrick

    Program Manager – Live Search

    Yes, Live Search is getting better and doing more for the webmaster community. But this incident concerning testing of cloak detection revealed several weaknesses in your response to… "Mess-ups". Just hope you learn from this experience and react faster next time.

    I agree witht the other commenters here that Live search is improving. But I still wonder how it is possible that these big mistakes can happen. Could you tell us how you test changes / new tools? For me it would make sense to test such a tool first on a small scale. Keep up the improvements.

    Some webmasters have ‘bad bot’ detection scripts running on their sites, and have reported that these quality checks register as referrer spam, automatically banning the bots.  How do you deal with this issue?  If a site bans the originating IP address, will this result in penalties for the site?  Seems like a major flaw to me.

    Quote: (It should be noted that not all cloaking is spam related and we do our best to take this into account, however, we still don’t recommend cloaking in any situation).

    I like this and wait for long time. I hope that main SEs have the same rules of cloaking.

    Nice work, definitely good to prevent cloaking. How is the tweaking going?

    Hi Interaction design!

    Tweaking should be banned.

    Dear Jeremiah Andrick,

    I think that webmasters want to be supported by Live and do not want to belong to only one SE.

    Hi Quảng! I agree with you. I do not want to be dependent on anything, not only SE.

    Glad to see that LIVE i s doing more for the webmasters community :), hope live to take more marketshare.

    really a great help info,  let me know more about black hat seo

    Even  though this started a year ago, looks like it is still going on. LIVE, are you going to be continuing this testing indefinitely?

    I am just wondering, that you are the only search engine competitor to use this questionable technique…

    I’ve started my new domain lastly. Now I get most of my traffic from – thanks anyway :-)

    I start to block all those IPs via .htaccess now.

    Since just very few use this crappy SE, its no loss and my Logs will probably thank me for getting rid of useless Referrals.

    @Rainer: This aint the traffic you want to have, since it aint a real interested Visitor !

    so what’s up with the referrer spamming?

    I understand it’s part of your effort to improve the Live search engine, but how come I don’t see similar spam referrers from other search engines?

    Clearly, this means the spam referrers from the Live search engine is not really necessary.

    It’s become very annoying and irritating.

    I hope you can come up with an algorithm that does not bombard blogs or websites with random queries and whatnot.

    Maybe once this is fixed, I’ll start considering using Live Search. Until then, I won’t even look at it.

    Please please please stop hitting my site with this random nonsense that only benefits yourself.

    Accurate search relevance is what give a search engine big reputation. Continue the good work to weed out search spam from result pages. I still don’t understand why some webmasters use cloaking technique to serve a different version of a web page to search bots than the actual page that is served to user’s browser.

  24. Quality Directory

    If a search engine cannot distinguish legitimate information from search spam, the relevancy is gone. You're right on track.

    Just hope you learn from this experience and react faster next time.

  26. Anonymous

    Anything that installs itself and then makes it so hard to get it off your computer is a VIRUS and Bing is that.  With out a by your leave I woke up with Bing as my Firefox address bar search engine and can not find away to remove it.  Microsoft has become a HACKER site. Generating Trojan search hijacker programs and distributing it with MSN, Hotmail, and their messenger application.  Death to Microsoft. All should rise up and uninstall as much Microsoft software as you can live without.

    your post is helpful and informative

