Challenging the Challenge to the Bing It On Challenge

Ian Ayres posted an article on Freakonomics yesterday challenging the claims that have been used on the Bing It On website.

A couple of notes are important before I talk about Ayres’ claims. There are two separate claims that have been used with the Bing It On challenge. The first is “People chose Bing web search results over Google nearly 2:1 in blind comparison tests”. We blogged about the method here and it was used back in 2012. In 2013, we updated the claim to “People prefer Bing over Google for the web’s top searches”, which I blogged about here. Ayres’ frequently goes back and forth between the two claims in his post, so I wanted to make sure both were represented. Now, on to Ayers’ issues and my explanations.

First, he’s annoyed by the sample size, contending that 1,000 people is too few to obtain a representative sample on which to base a claim. Interestingly, Ayres then links to a paper he put together with his grad students, in which they also use a sample size of 1,000 people. They then subdivide the sample into thirds for different treatments condition and yet still manage to meet conventional statistical tests using their sample.

If you’re confused, you’re not alone. A sample of 1,000 people doing the same task has more statistical power than a sample of 300 people doing the same task. Which is why statistics are so important; they help us understand whether the data we see is an aberration or a representation. A 1,000 person, truly representative sample is actually fairly large. As a comparison, the Gallup poll on presidential approval is just 1,500 people.

Next, Ayres is bothered that we don’t release the data from the Bing It On site on how many times people choose Bing over Google. The answer here is pretty simple: we don’t release it because we don’t track it. Microsoft takes a pretty strong stance on privacy and unlike in an experiment, where people give informed consent to having their results tracked and used, people who come to BingItOn.com are not agreeing to participate in research; they’re coming for a fun challenge. It isn’t conducted in a controlled environment, people are free to try and game it one way or another, and it has Bing branding all over it.

So we simply don’t track their results, because the tracking itself would be incredibly unethical. And we aren’t basing the claim on the results of a wildly uncontrolled website, because that would also be incredibly unethical (and entirely unscientific).

Ayres’ final issue is the fact that the Bing It On site suggests queries you can use to take the challenge. He contends that these queries inappropriately bias visitors towards queries that are likely to result in Bing favorability.

First, I think it is important to note: I have no idea if he is right. Because as noted in the previous answer, we don’t track the results from the Bing It On challenge. So I have no idea if people are more likely to select Bing when they use the suggested queries or not.

Here is what I can tell you. We have the suggested queries because a blank search box, when you’re not actually trying to use it to find something, can be quite hard to fill. If you’ve ever watched anyone do the Bing It On challenge at a Seahawks game, there is a noted pause as people try to figure out what to search for. So we give them suggestions, which we source from topics that are trending now on Bing, on the assumption that trending topics are things that people are likely to have heard of and be able to evaluate results about.

Which means that if Ayres is right and those topics are in fact biasing the results, it may be because we provide better results for current news topics than Google does. This is supported somewhat by the second claim; “the web’s top queries” are pulled from Google’s 2012 Zeitgeist report, which reflects a lot of timely news that occurred throughout that year.

To make it clear, in the actual controlled studies used to determine what claims we made, we used different approaches to suggesting queries. For the first claim (2:1), participants self-generated their own queries with no suggestions from us. In the second claim (web’s top queries), we suggested five queries of which they could select one. These five queries were randomly drawn from a list of roughly 500 from the Google 2012 Zeitgeist, and they could easily get five more if they didn’t like any queries from the five they were being shown.

So there you have it: several claims by someone unfamiliar with the actual Bing It On studies and a few answers from some intimately familiar with them. For those who have them, I’m always open to questions about the studies we conducted – feel free to shoot me a note or leave a comment.

– Matt Wallaert, Behavioral Psychologist, Bing