AI at Scale in Bing

Every day, users from all over the world perform hundreds of millions of search queries with Bing in more than 100 languages. Whether this is the first or the millionth time we see a query, whether the best results for a query change every hour or barely change at all, our users expect an immediate answer that serves their needs. Bing web search is truly an example of AI at Scale at Microsoft, showcasing the next generation of AI capabilities and experiences.

Over the past few years, Bing and Microsoft Research have been developing and deploying large neural network models such as MT-DNN, Unicoder, and UniLM to maximize the search experience for our customers. The best of those learnings are open sourced into the Microsoft Turing language models. The large-scale, multilingual Natural Language Representation (NLR) model serves as the foundation for several fine-tuned models, each powering a different part of the search experience. Here are three examples of AI at Scale helping us fulfill the needs of our search users better.

A new intelligent answer: Yes/No summary

It is common practice for users to look across sources to confirm and feel confident about an answer, and we are pushing our intelligent answers to do the same, saving users even more time. We have always gone through many documents to find and generate the best intelligent answer. Going one step further, we want to summarize the answer, possibly with a single word.

To bring this great search experience to users, we started with a large pre-trained language model and we followed a multi-task approach. That means we fine-tuned the model to perform two separate (but complementary) tasks: assessing the relevance of a document passage in relation to the user query, and providing a conclusive Yes/No answer by reasoning over and summarizing across multiple sources.

In the following example, the query is “can dogs eat chocolate,” and we synthesize across sources to generate an unambiguous No. Thanks to the language understanding brought by NLR, the model can infer that “chocolate is toxic to dogs” means that dogs cannot eat chocolate, even though an individual source may not explicitly say that.


This new search experience is live in the United States and will be expanded to more markets soon.

Expanding our intelligent answers globally

Last year, using multi-task deep learning, we created a Turing NLR-based model that improved both intelligent answers and caption generation – the links and descriptions lower down on the page – in English-speaking markets. It was the first time Bing used a single model to perform both tasks.

In order to bring the same improvements to our users globally, we took what is known as a zero-shot approach to the problem. While the NLR model is pre-trained on 100 different languages, the question answering model is fine-tuned only with English data. Even though that fine-tuned model has never seen labeled data in other languages, it is able to draw from the knowledge accumulated and the nuances of language learned by NLR to significantly improve our intelligent answers globally.

The following example is a query in Italian that translates to “red turnip benefits”. The intelligent answer returned by Bing is generated from a top search result, using the very same universal model that we operate in English-speaking markets.

The query translates to “red turnip benefits”.

This is an English translation of the Italian search query and result (using Microsoft Translator).
This improved universal model is currently running in 13 markets and we are looking forward to releasing it globally over the next few months.

Understanding query intent to improve search relevance

At the core of search relevance is understanding the user intent behind a search query and returning the best web pages that fulfill it. To solve this task at scale, we created a NLR-based model that is fine-tuned to rate potential search results for a given query, using the same scale as our human judges.

This model is able to understand complex and ambiguous concepts much better than its predecessor. The following query is “brewery germany from year 1080”. It turns out there is no known German brewery founded that exact year. However, we can assume the user was looking for a very old (millennium-old!) brewery in Germany, even though they may have misremembered or mistyped the year.

Whereas our previous model returned a somewhat generic list of German breweries, our new NLR-based model correctly identified the Weihenstephan Brewery, founded during that time period (but in year 1040 instead of year 1080).




Running such powerful models at web-search scale can be prohibitively expensive. To improve search relevance for all our users globally, wherever they are located and whatever language they speak, we implemented many GPU optimizations and we are running this model on state-of-the-art Azure GPU Virtual Machines. We also wanted to contribute back to the NLP community and earlier this year we open sourced these optimizations into the ONNX Runtime.


Thanks to NLR and the fine-tuned models built on top of it, we are enabling AI at Scale and crafting a better search experience for our Bing users globally.

We would love to hear your feedback or suggestions! You can submit them using the Feedback button in the bottom right corner of the search results page.