Doing a good job of helping Search users to correct misspelled queries is super important for two main reasons: a) 5 billion crawled docs and a bleeding edge ranking algorithm can’t do much if the query isn’t spelled right and b) more than 10% of all searches are misspelled! So we made sure our new search engine included a revamped spelling correction system that’s much better than our old one.
To improve the speller we worked with Silviu Cucerzan and Eric Brill from Microsoft Research’s Text Mining, Search and Navigation Group. Silviu and Eric have developed some novel techniques for using search query statistics and iterative transformation of query strings to improve spell correction. Their published paper on this topic – Spelling correction as an iterative process that exploits the collective knowledge of web users –goes into much more detail on some of the technical thinking that inspired the spelling correction system we built.
Among others, we also got help from Greg Hullender, a Search engineer who previously worked on some of the earliest spell correction systems and who more recently architected the handwriting recognition algorithms for Microsoft’s Tablet PC software. My main contribution to the spelling project was that I’m a uniquely terrible speller. This “talent” enabled me to rigorously put the speller through its paces before we made it public.
Here are a few examples of misspelled searches plucked from the query logs for which our new speller does much better than the old one:
• cheaspeake wall coverings
• how to surive in alaska
• federal plus studend loan program
Despite being a significant improvement for MSN Search users, the new spelling correction system is still far from perfect. Looking through user feedback, there are still a few areas in which we have room for improvement. For example, we often suggest corrections for correctly spelled names, and we are seeing this a lot during our beta phase since many users query their own name to evaluate our new search engine. There are also cases where we’re not making simple corrections, such as Helo 2 or caffine levels of sodas. (Yes, you guessed it, both of these errors were sent to me from folks on the Halo 2 team.)
An accurate search spelling correction system is essential to helping users find the information they’re looking for. So in the coming months we’ll be working to increase the accuracy of the spelling correction system. We’ll also be expanding the system to include many more languages, since the English and German-speaking users of our search engine aren’t the only imperfect spellers on the planet. Please continue to send us feedback via the “help us improve” link when you encounter the spelling system not doing the right thing. We try diligently to read everything that comes in!
Lastly, if you haven’t yet seen the documentary Spellbound that was in theatres about 2 years ago, you should drop everything and rent it! It follows a bunch of students from all over the USA competing to win the 1999 National Spelling Bee competition. Spellbound is an inspiration to bad spellers everywhere.
– Oliver Hurst-Hiller, Program Manager