SIGIR + Microsoft Research = LOVE

Part of what makes working at Microsoft fun is the abundance of “smartypants” people that I get to work with. The downside is that Bing gets a workout as I casually use my PC to look up words as I talk to them.

This week at SIGIR (Special Interest Group for Information Retrieval – now you know why they abbreviate it) in Boston, our own Microsoft Research group is showing off some future-looking innovations in the search and information retrieval field. In fact, 21 of the 74 papers accepted at this year’s conference have at least one author from Microsoft Research. Not that we’re competitive or anything.

While you could go to the site and check out all the papers that are being presented, I’ll make it easier for you and recap the things that I think are especially cool as they relate to the future of search.

First, people talk a ton about “context” in search. Why is this so important? Think about when you talk to a buddy and the conversation goes like this:

You: “Hey have you seen the new Transformers movie?”

Her: “No. But I hear there are lots of explosions.”

You: “Me too. I love explosions. Almost as much as robots.”

Her: silence.

You: “Do you want to see it?”

More than likely she will understand that even though the discussion digressed into the relative merits of explosions and robots, you are asking if she wants to see the movie with you. A search engine doesn’t do well with this type of conversation today. You constantly have to repeat yourself and give it hints (although this sounds like my wife’s complaints when talking to me). A few Microsoft Research folks got together to work on something called “Context Aware Query Classification” which is an attempt to imbue search engines with more of this very human attribute – namely, being able to understand the gist of a question given what one knows about the person or the previous conversations.

Next on the hit list: breaking news. Here’s a brainteaser for you: If I show ten people a list of links for the query “chocolate” and ask them which one link they would choose that best describes chocolate, I’ll likely get a pretty consistent distribution. In other words, I’ll be able to statistically model which link is ‘most relevant’ to that audience. This works well for known terms and is certainly one of the factors Bing uses to figure out what link to display first for queries like “smile mirror.” However what about things that just happened like “New Zealand Earthquake” where there is no historical data to use to figure out which results to show and where to show them? What do you do smart guy, what do you do?

Well if you were the smart guys in Microsoft Research, you would have worked on a prediction algorithm for news-related queries that attempts to figure out what and when to display certain content. And that’s what they’re talking about this week at SIGIR.

Lastly, there’s the challenge of finding the needle in the haystack. In this case, the haystack is all the questions and answers in online communities and the needle is that one good answer you’re looking for. We know that many of you spend a lot of time in user-generated content communities; from knitting to skiing, computer software to graphic arts, there are message and discussion boards in which passionate people from across the world engage in conversation at all hours of the day. We also know that people’s happiness with their online communities is related directly to how easy it is to find high-quality answers to their questions. The good scientists at Microsoft Research used something called “analogical reasoning” to try and find the best answer to a question and help folks cut through all the bad answers and spam that pervade these boards. They ran the test on 30 million question and answer threads (imagine that poor intern) and had good results. Bottom line: being able to find a good answer in online communities to “I’m having problems in soldering some 8MSOP chips to my vero board” might just get easier.

There’s a ton more happening this week at SIGIR. If you’re near the Sheraton in Boston this week, just run in the lobby and yell out “distributional measures for lexical similarity stinks compared to kernel methods for classification!” and watch the ensuing brawl.

Stefan Weitz – Bing, Director

Update from the event!

The ever-awesome Susan Dumais has received the Gerard Salton Award from ACM SIGIR. The Salton Award is presented every three years to an individual “…who has made significant, sustained and continuing contributions to research in information retrieval.” A look at her bio from above will help you understand why she was selected to receive this honor – my browser broke while scrolling down the page. This is a distinguished honor awarded in the Information Retrieval community and really is more like a lifetime achievement award, except Susan didn’t have to go Hollywood to get it. Susan is the second MSR researcher to receive this recognition; Stephen Robertson of the MSR Cambridge Lab received the award in 2000. Congrats Susan! Win it again in 2012 so I can yell “Three-Peat!” down the hall at random.