Bing delivers text-to-speech and greater coverage of intelligent answers and visual search

At NVIDIA’s GPU Technology Conference this week, Bing demonstrated natural sounding text-to-speech AI, expanded intelligent answers, and the ability to quickly see multiple objects auto-detected within an image to search for visual matches. All these features help you find what you’re looking for faster and are powered via Azure virtual machines running on NVIDIA GPUs optimized with NVIDIA CUDA-X AI software libraries.


The updated Bing app can now change text to speech, meaning Bing can speak answers to your queries back to you with a voice that’s nearly indistinguishable from a human’s. This advance was made possible by breakthroughs in deep neural networks that give our AI human-like intonation and clear articulation of words. In addition to improvement to our conversational AI, this capability as a real-time service would not be possible without the higher processing power of NVIDIA GPUs.

The Bing app also supports speech as an input, meaning you can speak to your mobile device and Bing will change your spoken word to text and search that query for you. Simply press the microphone button on the app homepage, speak your question, and you’ll get search results.

Intelligent answers

Bing intelligent answers allow you to get comprehensive, summarized information aggregated across several sources in response to certain queries.

We’re now taking intelligent answers one step further by advancing our deep learning models. These models require a lot of processing power, so we’re leveraging recent advances in GPU technology that allow us to process entire web pages much faster and more efficiently than traditional models powered by CPUs. This advance allows us to provide answers for harder questions than ever before. For example, instead of the relatively simple answer to ‘what is the capital of Bangladesh’, Bing can now provide answers to more complex questions, such as ‘what are different types of lighting for a living room’, quicker than before.

Visual search

Visual search is another area in which recent developments have enabled huge strides in efficiency and coverage.

Visual search allows you to search using an image. For example, if you see an image of an accent light you like, Bing can show visually-similar decor and even show purchase options at different price points if the item is available online. To save you time, visual search also automatically detects and places clickable hotspots over important objects you may want to search for next.
Our advanced visual search capabilities such as object detection are quick and automatic using NVIDIA GPUs for inferencing, which have yielded dramatic processing efficiency compared to CPU-powered inference, hence unlocking this scenario for our customers.

New intelligent scenarios powered by Azure and NVIDIA

These new scenarios are all made possible by Bing powered by Azure N-series virtual machines running NVIDIA GPUs. Text-to-speech, speech-to-text, instant answers, and visual search are all part of the next great search frontier, and we’re very excited to see what our partnership continues to enable in the future.