This month, we released a new Bing image search experience designed to help customers be inspired, learn more, and do more with image search. We discover billions of images on the Internet, but understanding the searches and providing helpful information for each image is an enormous challenge. In this blog post, my colleagues Arun Sacheti and Eason Wang provide a deeper look at how we built the image graph that allows us to automatically derive useful descriptions, captions and actions for images in our new search experience.
Dr. Jan Pedersen, Chief Scientist, Bing and Information Platform R&D
From Image URL Graph to the Image Graph
The first generation of image search relied on similar principles to web search, where content is analyzed based on URLs (Figure 1).
Figure 1: Web URL Graph
A URL is assumed to be unique, so each image URL is considered as a unique image. But as we know an image often appears many times on the web. For instance, a product can be sold on different commerce sites, but be represented by the same image. A photo of a celebrity can appear in many places on the web including news sites, blogs or as part of tweets. The same image content appears in different sizes with small visual differences as illustrated below. The first image and the second one are about the same but have differences in size, cropping and text.
Our data shows that 59% of images have at least one duplicate somewhere on the web. As illustrated in the chart below, images are often duplicated hundreds – even thousands of times. In duplication cases, the image content is the same, but each page has unique document text and query associations that place the image in context. Alterations to otherwise identical images, such as cropped photos or photo edits, as seen in the earlier example, pose additional challenges to image search. Our customers would be forced to visit and analyze each of these web documents to learn about the content within the image.
Fortunately, modern Image Understanding techniques create a big opportunity to bring all the metadata together for exact or near duplicate image content that is currently fragmented across many URLs. Bing seizes this opportunity by analyzing every image’s content and context and constructs an advanced graph clustering similar images and all their associated metadata together. This graph, which is pivoted on content within the image rather than URLs, is what we refer to as the Image Graph and is the foundational building block of our new experience.
The corpus of web images has grown exponentially to hundreds of billions. At this scale, understanding the Image Graph based only on image content by computation is very costly. Furthermore, the graph is updated constantly to keep up with millions of new images added every day. Building on technology developed in collaboration with Microsoft Research, the Bing Team has been able to build an efficient and scalable graph with 100s of billions of nodes and counting.
Figure 2: Image Content Graph
Amplifying the Image Graph
Furthermore the Image Graph gets even more powerful as we link knowledge acquired to the unique content within the graph including:
- Key concepts within the image through state of the art image understanding capabilities developed by Microsoft Research and Bing
- Context around the image within the Web page where the image is hosted on the web
- High volume user interaction with the content
- Knowledge received directly from partners through annotated feeds
Our goal is to help users be inspired, learn more and do more with image search. Our team mines through each of these sources to generate datasets in billions of content clusters to support each of these user goals. Examples include Best Representative Query (BRQ), Captions, Related Collections, Shopping Sources and More Sizes.
Figure 3: Adding Knowledge to the Image Graph
The following section explains generation of these datasets in more detail.
Be Inspired through Human Curated Collections
Passionate curators painstakingly create topical image collections reflecting their unique style on Pinterest, Tumblr, Etsy, etc. In 2013, we enabled search within these collections for the first time (blog), but collections remained independent from one another and from individual images.
With the Image Graph, we are able to find deep links among collections. If one image appears in a few different collections, these collections are connected with each other.
Figure 4: Linking the Image Graph to Human Curated Collections
People who built similar collections are likely to share similar interests. When our users explore an image in a collection, we can now show other collections containing this image. This is a new path to related content that would have otherwise been undiscovered. This is a useful way for our customers to find great ideas to get inspired.
Consider this image for a search on “living room decorating ideas“. If our customer likes this decoration style, we can offer insight through others who collected the same image. In this case, the same image has been pinned to multiple collections in Pinterest.
The various collections containing this image reflect a few different interests. This user has collected quite a few ideas for shades using green color, the same attractive color in the image.
Another user has a collections of everyday things in green, including home furnishing, shoes, dresses etc.
With these connections made through the Image Graph, we’ve connected our customers to others who share interest in similar decorating styles. People can then also discover new content related to decoration ideas and dressing styles that they are likely to be interested in.
Learn More with Best Representative Query (BRQ)
In first generation Image Search experiences, the user’s own search term is often the only description available for the images returned. We associate thousands of search terms with many of the images in the search index. This is a long-standing ranking ingredient.
We believe that every image should be clearly and simply identified to our users. We challenged ourselves to identify a minimal and human-readable set of terms that can identify the key concept in the image. We called this the Best Representative Query – the BRQ.
We leveraged the Image Graph to mine user interactions with its content clusters. We also looked at the web page context where images are hosted. With this data, we built an algorithm to extract the image’s primary focus. The algorithm has been applied to billions of images and the BRQ is preserved in the Image Graph. In Bing Image Search, the BRQ is displayed on the hover of search result as well as in the new image viewing experience. The BRQ is designed to be search-friendly, so you are one click away from learning more about the key concept for each image result.
You can check out more examples of BRQ for images for these queries “tie dye dress”, “tofu recipes”, ”clipart”.
Learn More with Rich and Descriptive Image Captions
A good caption is a snippet that helps you understand an image and provides a link to the webpage where you can continue your research. It can even open your mind to think or act differently. But how can machines “understand” what an image is about? Researchers in computer vision from Microsoft Research and engineers from Bing have been working on this question for many years.
With the Image Graph, we can mine text from all of the pages containing the image. This leads to tens of thousands of candidate snippets for the image. The challenge we had was to identify the most useful snippet – a needle in a haystack.
We defined ideal captions as ones that would:
- Help with understanding the image
- Be interesting enough to stimulate curiosity
- Identify key concepts within the image
- Be grammatically correct
We were able to achieve this at scale by utilizing knowledge of user interaction with the content and applying image understanding models to determine key concepts within the image. We were able rank all the candidate captions based on relevance to the image, readability and importance to users.
For example, the image below is from serc.carleton.edu on a page with virtually no descriptive text. Fortunately, we were able to caption it with text from another web page at dailymail.co.uk that contained the same image.
“Mercury is the innermost planet in the solar system – and it would make sense for the tiny planet to be ‘tidally locked’ to the sun, with one ‘day’ being the same as the planet’s ‘year’.”
The additional data helps users to learn more about an image. You can try more examples below to see how our new captions can help, e.g. “praying mantis“, “sir francis drake“, “how does a rainbow form” and “how does the eye work“.
Do More through Shopping Sources
Our customers often come to Bing image search for ”window shopping” and purchasing. For example, you can use image search to find a dining set for your home. You love this dining set for your house and want to purchase it, how can you buy it?
As you are searching and looking at this set or other sets we are able to suggest multiple shopping sources where this set is available (beta).
Most image results in these scenarios are not from e-commerce sites, which can cause a lot of frustration. With our quest to help users do more with Bing Image Search, we connect the dots to help our customers map one image to multiple shopping sources where the product can be found for sale and enable deal shopping.
We regularly index shopping pages across the web, pages that can be classified as shopping pages based on content within the page as well as content present in structured tags based on OpenGraph, Schema.org, RDFa and Microdata/Microformats. Based on the image search graph, we can find various shopping sites selling the same product by flagging where the same image content appears in different shopping pages. More details on specific formats and schemas is covered in our webmaster blog.
Reverse Image Look-up
We know that you see images while browsing the internet that you want to learn more about. It’s inconvenient and difficult to open an image search session and sometimes impossible to find the same image.
The Image Graph has the ability to easily do a reverse look-up of image content in near real-time. We have made a reverse look-up easy through our chrome extension.
Figure 5: Enabling Reverse Look-up to connect the Image Graph
After installing the Bing Image Match extension, go to any website and hover over any image. You’ll see a search icon appear in the upper right hand corner of an image. Click the icon you’ll see an overlay with rich information from the Image Graph.
You can try a reverse lookup of images on these pages: rustic wagon coffee table, riding boots knee high socks, chicken parmesan with pasta, capitol reef national parks. With the reverse lookup, all knowledge from the Image Graph was exposed without you having to do any searching. You can even see more visually and semantically similar images in the filmstrip below the image through BRQ.
The evolution of the Image Graph has come a long way, from URL graph, to content graph, to knowledge graph. Bing image search has brand new experiences powered by the Image Graph. We are committed to building out the Image Graph, and sharing its knowledge with you through Bing and other Microsoft experiences. We will continue to seek out ways to help our users be inspired, learn more and do more with Bing Image Search.
We are eager to hear from you. Tell us about your experience with Bing image search via Bing Listens.
Arun Sacheti (Group Engineering Manager) and Dr. Eason Wang (Program Manager)
On behalf of the Bing Image Search Team