A Behind the Scenes Look at How Bing is Improving Image Search Quality

In this post we wanted to take the opportunity to give you a behind the scenes look at the ongoing work we are doing to improve image search quality at Bing. This blog will give you an overview of the many years of work done in Bing Research and Development in collaboration with Microsoft Research in our quest to deliver the most relevant images possible. My colleague Meenaz Merchant will give you a closer look at our approach and how that compares to Google in order to showcase the respective strengths and challenges moving forward. We hope this post will serve to spark a conversation on image search that helps the search industry move forward and in the end benefit you when you’re searching for images.

Dr. Harry Shum, Corporate Vice President, Bing R&D

Every day Bing receives millions of searches from across the Web, and nearly 10 percent of those searches are for images. With 40 percent of search results including some kind of visual component, we know that people are more likely to click on a web page that includes images or videos. People search for a variety of images ranging from celebrities, artwork, products, fashion, attractions, and landscapes to clip-art, illustrations, animations, borders, icons, logos, and backgrounds to name a few. Searching for images allows people to learn about new concepts, browse engaging pictures or complete tasks like finding an image for a school report or presentation. Expectations for search engines to understand what is being searched for are high, and Bing has made significant advancements behind-the-scenes to continue to deliver relevant, beautiful images and improve the image search experience.

Since we launched in 2009, the number of image searches has grown by 520 percent. To achieve this, we have utilized a number of approaches including natural language processing, entity understanding, computer vision and machine learning to provide high quality, relevant images. By leveraging the big data generated from billions of searches and information contained within images alongside petabytes of signals from social networks and billions of clicks, we have designed massive machine learning systems that attempt to determine the intent of your image search. With the focus on natural language and entity understanding, for instance, we have improved Bing’s ability to understand people’s intent beyond just queries and keywords. 

Let’s look at how we are employing these various techniques:

1. Entity Understanding – Is it a person, place or thing? A key aspect in improving image search quality is understanding the focal object or person that someone is looking for. For example, with a search of Prince Albert of Monacothe goal is likely to learn who the Prince is and see multiple images of him. Here, Bing understood the search, mapped it to the right entity and then found the corresponding image documents that describe the entity. Based on entity understanding for Prince Albert of Monaco, Bing is able to understand the primary intent of the search and find relevant images of what the Prince looks like. Google has taken a different approach where they show images of the Prince and Princess as a couple and in social settings, interpreting the entity in a different way. Those results are pushed further down on Bing, rather than mixing them in. The same goes for types of entities like attractions or photographs of landmarks.

Bing’s results

Image 1 Albert

Google’s results

Image 2 Albert

2. Big Data: Customer feedback is an extremely useful signal. When customers click on pictures they enjoy and find useful, Bing is able to use that data to interpret future searches and find more relevant results. By using image click data from the Web and social signals, Bing combines visual and text features to better understand what the searcher is asking for. For instance, in a search for Snow White portraits, Bing can understand that the customer is looking for paintings and pictures of Snow White by interpreting image click data to help determine which individual images are the most popular and most relevant. You’ll see that Google has interpreted the search in a different way, showing photos of Snow White characters and images from the movie Snow White and the Huntsman.

Bing’s results

Image 3 snow white

Google’s results

Image 4 Snow White

3. Computer Vision Technologies – Duplicating human vision abilities: An important challenge in image search is being able to process images similar to the way the human brain sees them. It’s important to process all the images coming in to Bing’s database and to generate image content that Bing can use to rank based on relevance techniques. In particular, Bing uses computer vision technologies such as deep learning to interpret high dimensional data from the real world to understand the image better. A search for the tallest peaks is a good example of the benefit of leveraging these technologies, as Bing is able to process the image data of tallest peaks and integrate a wide range of visual perceptions similar to how a human brain processes and perceives images. You can see Bing’s results show images of the top ten tallest peaks and other major mountains in a variety of regions from around the world, whereas Google shows images of people reaching the summit of mountains, graphs, maps and other views of mountain ranges that aren’t necessarily the tallest peaks.

Bing’s results

Image 5 peaks

Google’s results

Image 6 peaks

4. Thematic Intent Focus – Capturing the broader theme: Once Bing understands the intent of a person’s search, it does not treat an image search the same way as a standard web search. For example, when searching for photos of Italy, Bing understands the main intent of the search is to see beautiful photographs of Italy. However, Bing goes beyond just taking words like “photos” and “Italy” and looking for images on web pages that have these terms next to the images. Instead Bing captures the theme of the search and matches it to text and image features that have the same theme – what we call thematic intent focus. You’ll see below that Bing has found iconic imagery of major landmarks in Italy, providing a spectrum of relevant results for “photos of Italy.” Google, on the other hand, has pulled a mix of similar images, but due to Google’s focus on text features of “Italy” results such as a map of Italy and the Bank of Italy in California are shown.

Bing’s results

Image 7 italy

Google’s results

Image 8 italy

5. Exact and Near Duplicates – Recognizing the copy cats: A common problem on the internet is the duplication of publically available images. Some images are exact duplicates while others are near duplicates. Near duplicates are images that have gone through editing steps such as color mapping, scaling, padding or cropping. Identifying near duplicates is a challenge due to the sheer size of the index and the millions of new images being discovered every day. We are continuously improving our detection process to be faster and more accurately recognize duplicates or near duplicates so you have a broader selection of images to choose from. For example, when searching for images of Harry and Hermione, Bing’s algorithms quickly group images with similar attributes to distinguish between duplicates and near duplicates, providing more original content and relevant results for customers. You’ll see below that Bing’s results show a variety of images of Harry and Hermione from different scenes, giving a good overview of the many images of the two characters. You’ll see based on Google’s results of Harry and Hermione that it has a harder time distinguishing the difference between a duplicate or a near duplicate image.

Bing’s results

Image 9 harry

Google’s results

Image 10 harry

6. Aesthetics as a Feature – Making it easier to view at-a-glance: Another important aspect of image search, which is different from web search is that people want to see multiple images at a glance rather than dig through web pages to find the right image. When a person sees multiple images at a time, it’s easier to compare and contrast the different photos at one time, providing a better visual appeal of all the images and how they interact with each other becomes important. Let’s take a search for Stanford University as an example. By looking at image features across the results, we are able to cluster all the results in such a way that are easy to see at-a-glance, with the images with the main intent right on top.

Bing’s results

Image 11 stanford

Google’s results

Image 12 stanford

7. Quality of Image as a Feature: Most people browse through multiple images when conducting a search for images. On Bing, when you click on an image thumbnail, a larger image appears on your screen. This allows you to fully experience the picture and quickly browse to the next. When viewing such images on your PC or tablet, it’s important that these images are of high quality with attractive elements like good contrast, lighting and composition. Relevance being equal, Bing prefers high quality images to low quality images to ensure that people have the optimal viewing experience. Let’s examine the top results for a search of Khloe Kardashian where we’ve included the image resolution at the bottom of each image for your reference. You’ll notice the resolution for Bing’s images are over twice as high as Google’s.

Bing’s results

Image 13 Khloe

Google’s results

Image 14 Khloe

The examples above are only a sample of the improvements and challenges facing the search industry when it comes to image search. With 40 percent of searches including some visual component, there is a tremendous opportunity for us to build on the advancements we’ve made.

To learn more, watch this video between myself and Stefan Weitz discussing image quality at Bing.

– Meenaz Merchant, Senior Program Manager, Bing R&D

Share on twitter Share on facebook Share on linkedin Share on linkedin