Data Mining, Machine Learning, and Bing Testing by Ken Johnston

We don’t share everything about the internals of how Bing operates but we have publicly shared with the world that Bing has its own version of Hadoop (also similar to Map Reduce/Google File System) that we call Cosmos and shared that we have multiple Petabytes of data on this data cloud. One of the interesting things to think about is what testers can do to improve quality when they have massive amounts of real world data.

What we are doing is literally moving from traditional testing toward Data Mining and Machine Learning as our major tools for driving improvements to overall quality! The approach you simply have to do is come up with the question and then run the analysis to find the bugs. It’s truly amazing. Let me give you two quick examples.

Question: When people type in three consecutive queries and then leave Bing without clicking on a link how did we fail?

That’s a good question. I would argue that if you run any query on Bing and you don’t find your answer in the top three blue links we’ve failed, but clearly if a Bing user tries to refine their query two or three times and they still don’t get an answer that satisfies their need, it’s a definite bug. We actually run this analysis quite often because we are always looking for ways to improve Bing. Here is one interesting bug we found by asking that question, it’s a bug that both Google and Bing actually have.

Answer: Users get into a vertical and want to run a web search. To them a vertical is still Bing and should behave the same no matter what.

In this case we found several cases that someone may have been using the Bing Shopping feature to research a product choice. Then they decided to go directly to the retailer to make their purchase. They go to the search box and type in the name of the retailer they want to go directly to. Instead of giving them a nice web search result we keep presenting the user with more products.

Step 1: While in Bing Shopping type in a web search for a retailer. For more information on how the Bing Shopping Facebook integration works read this related blog post here.

Step 2: Get frustrated and refine the query. Note that even though the advertisement link will get the user to the retailer they did not click that link. Instead they abandoned Bing all together.

Expected Result: Notice the nice web result with deep links, phone number, integration with Facebook feeds, that’s what the users wanted.

The competition gets it wrong too: Like I said, it isn’t just Bing that frustrates users this way. The other sites handle the same query from their shopping vertical in a similar way.

That is an example of using data mining to find a category of web searches we call abandoned. There are many cases like this in the Bing product that many developers might argue is a feature and not a bug. When you use data mining you can show how often users run into it. It’s no longer the opinion of the tester or the other engineers. We can quantify the impact and make an objective decision.

Question 2: By analyzing Meta data and images for the movies search results can we find quality issues with the underlying data? This is a machine learning scenario where we actually analyze top queries and the visual look and data patterns. The machine algorithm then flags the live data bugs for us.

Example 1: Why is a movie without the major actress, from 2006 with two stars listed above a movie with the major actress in it, released in 2010 and with three and a half stars? See this example for the movie Salt. I suspect most users and even Angelina Jolie would prefer the order of the results to be reversed.

The role of testing within Bing is changing and evolving rapidly. Personally I remember when the goal of a tester at Microsoft was to knock out thousands of automated test cases and when one of them broke, file a bug. The problem with that approach is that quite often you will fix bugs that a user would never find and you are missing many bugs the user will eventually run into after the software is release. Compile on to that an agile development approach like we use here in Bing and you would spend way too much time testing and fixing the wrong priorities. Data Mining and Machine Learning are the critical approaches we use for a leaner and more real time testing approach.

For all job inquiries or openings for this team please email Cindy Johnson HERE. Or connect with her on LinkedIn.

Feel free to post a comment, ask a question or follow up with me directly. Follow me on Twitter (@RKJohnston) or connect with me on LinkedIn and Facebook.

Ken Johnston is a Principal Test Manager for Bing’s Commerce Team. He is also the author of How we Test Software at Microsoft. Since joining Microsoft in 1998 Johnston has filled many other roles, including test lead on Site Server and MCIS and test manager on Hosted Exchange, Knowledge Worker Services, Net Docs, and the Microsoft Billing and Subscription Platform service. For two and a half years (2004-2006) he served as the Microsoft Director of Test Excellence. In 2003 he earned his MBA from the University of Washington.