To Err is Human

At Bing we’re committed to delivering you the best possible results.  While teams of researchers, machine learning experts and data miners continually improve our core spelling and ranking components, the reality is that some defects have historically slipped through the cracks. This is partly because search relies on people to teach the system and, since people make mistakes, some defects are introduced.  Our recent release, takes a step forward in addressing many of the most common defects that we find in the system.  In this post, my colleague Bill Ramsey, Development Manager for Bing, will examine three categories where we’ve reduced the occurrence rate and severity of defects: URL queries, Recourse Links and Related Searches.

Dr. Harry Shum, Corporate Vice President, Bing R&D

One of the major sources of defects pertains to what we call URL queries. These are queries like “facebook.com” or “yahoo.com/mail” where the query looks like a URL. At first glance, you might think this is a simple problem for a search engine. After all, we have billions of URLs – how hard can it be to find a match on one? In reality, this type of query is actually quite complicated. Because we’re all human, people use countless spelling variants. For instance, “facebook.com” has over a thousand different variants such as “facebookc.om”, “facbook.com”, and “ww.faceboo.omc”. On top of the spelling errors, people don’t always know the correct URL. For example, Southwest Airlines is southwest.com but some people attempt “swair.com” expecting to arrive at Southwest’s homepage. We also commonly see permutations of URLs such as “yahoo.com/mail” when the correct URL is “mail.yahoo.com”. Even if we figure out your intent, the multitude of spammers and squatters out there present another challenge. Spammers prey on variants of top domains like coolmathgames.com (people are actually looking for coolmath-games.com) or URLs that aren’t quite spam like facebooklogin.net (most people just want to login to facebook.com).

Our defect reduction efforts in this class of queries focused on three main areas:

  • The first was correctly identifying the URLs that we can correct. By identifying the troublesome URLs, we avoid problems such as including spam results like searscardcom.com.
  • The second effort involved expanding our ability to model the types of errors that users make based on how people are using Bing. By recognizing patterns in billions of logs, we are able to fix common spelling errors in URLs.
  • Lastly we analyzed billions of sessions to find patterns so that users looking for sites like “swair.com” would eventually end up on the intended site “southwest.com”.

image

image

Another example applies to machine learning models that would consider a query like “facebook login.com” to be equivalent to “facebooklogin.net” even though people’s usage patterns indicate that’s not the intended query. It’s very common for people to type a “.com” when they want a “.net” or “.org” domain. In addition, it’s also common for people to type queries like “bed bath and beyond.com” when they are looking for bedbathandbeyond.com. Our models have adapted to these patterns and now concatenate the terms and change “.com” to “.net” resulting in a seemingly unlikely but ultimately reliable interpretation of the query. By looking at several sources of user behavior we were able to refine our models to correctly deliver users what they intended (as with the Facebook example below.)

clip_image006_thumb[2]

Removing Superfluous Recourse Links

One of the critical components of a search engine is the query alterations component that performs spelling and query expansion. Spelling will correct the hundreds of misspellings for “Schwarzenegger” while query expansion will take a term like “store” and allow synonyms like “shop” to be used when appropriate to produce better results.

Recourse Links are the phrases that occur underneath the query box that indicate that we changed your query due to spelling or expansion and give you the ability to turn off all of our alterations as a recourse. For example, if you type “qyotes about success and hersos” we’ll show “Including results for quotes about success and heroes” but will give you the ability to only show results for “qyotes about success and hersos” in cases where we may have made the wrong assumption about your intent. The same holds true for expansions in cases where the expansions have significantly altered the results.

clip_image008_thumb[2]

In the past we used to show synonyms as part of our recourse links and this would open up some surface area for showing embarrassing alterations that either were off-topic or were superfluous. The query “define interesting” highlights an example where the recourse link was unnecessary. Even though expanding “define” to “definition” helped the ranker surface a better match it didn’t lead to a better user experience. In this case, showing the Recourse Link didn’t enhance the experience. image_thumb[9]_thumb[1]

While we’ve removed the Recourse Links in cases where we are very confident that they add little value or distract users, we will continue to show them when there’s a chance our query modifications are not what the user actually wanted. When Bing alters some of the words in the query to synonyms, the recourse link may not add as much value, so we have changed the text color to black in order to make it more subtle and have provided a better explanation for the user, as in the example below:

image_thumb[10]_thumb[1]

We will continue to try to find the right balance between retaining the ability for users to specify exactly how the engine should interpret their query while avoiding a Recourse Link that simply gets in the way.

Improved Related Searches

Related Searches are a set of queries that might be related to your initial query. We used to show Related Searches on the left side of the search results but are now showing them on the right side of the search results and occasionally in-line with the web results.

image_thumb[11]_thumb[1]_thumb

Sometimes our query expansion systems will cause the Related Searches to be off-topic. One such case is shown below where “AMD” gets expanded to some form of macular degeneration causing an unexpected set of results to be shown to users. By improving our relevance models we were able to fix several of these sorts of defects.

image_thumb[12]_thumb[1]_thumb

Another example is for the query “aol.com”. While our original results were reasonable, our systemic algorithmic improvements produced a much cleaner and relevant set of suggestions for our users.

image_thumb[13]_thumb[1]_thumb

We’ve also made many other improvements to Related Searches beyond relevance in terms of formatting (i.e. “KSN WeatherLab” à “KSN Weather Lab”), reducing duplicated terms (i.e. “Disney Channel Games Games” à “Disney Channel Games”), and avoiding showing adult content under moderate or safe search settings.

Conclusion

The beauty of search is that it relies on people to teach the system and as such, there will always be defects. Our hope is to reduce the rate and severity of defects so you can do less searching and get more done.

– Dr. William Ramsey, Principal Development Manager, Core Search, Bing R&D