Fast Front-End Performance for Microsoft Bing

The Bing search stack is built for speed at all layers. Fast performance delights users and drives loyalty and engagement. This blog presents the key performance techniques used by Bing’s front-end to deliver and render search results pages with world-class speed.

Front-End Delivery Architecture

Performance starts with architecture. Search queries to Bing return a “search results page” that is generated using Server-Side Rendering (SSR). The delivery architecture is shown below.

Figure 1: Front-End Delivery Architecture

Chunked Delivery

The search results page HTML is delivered to the client browser in multiple chunks. Chunk 1 contains HTML that is not a function of the specific search terms (the page header with logo, search box, and sign-in button) and is delivered immediately by the front-end servers. In parallel with Chunk 1 delivery, requests are made to the back-end stack that contains the search index and other services. These results are aggregated into the core “search results” that are delivered as Chunk 2. Subsequent chunks, collectively called Chunk N, contain “above fold” images (visible within the initial viewport) that are Base64 encoded in the HTML. Note that both Chunk 1 and Chunk 2 contain inline CSS and JS in addition to HTML.

The benefits of this delivery architecture are that 1) the critical path to First Render is very fast and overlapped with the “server think time”, and 2) for most searches, all content within the initial viewport can be rendered from just the base page HTML.

CDN

A CDN (Content Delivery Network) is a large, geographically distributed network of proxy servers optimized to be closer to users and well-connected to datacenters. Bing services are hosted in regional datacenters via connectivity through a CDN.

Figure 2: Edge Node and Datacenter Topology

The CDN is critical to Bing’s performance. User traffic routes first to the closest CDN node (called an ‘edge node’). The edge node decides based on the type of request to either proxy the request to a regional datacenter or to serve cached content from the edge node itself. For static assets such as JS (JavaScript) files and images, the CDN provides caching closer to the user which speeds their delivery during page load.

Using a CDN also results in a lower latency on the User ISP (Internet Service Provider) Network by keeping the connection distance short. This benefits all network requests between the user and Bing’s services in regional datacenters. Being closer to the user helps make connection establishment and recovery times shorter. A CDN is also better able to detect issues in ISP networks and potentially move traffic [1] to better edge nodes.

Between the Edge Node and the Bing Datacenter, the CDN ensures selection of high-bandwidth, low-latency links to the datacenter and keep connections to Bing’s services warm, ready to serve the next request with zero additional time spent on connection management [2].

Figure 3: Network Path from User to Bing Datacenter

Rendering Sequence

This example shows the rendering sequence corresponding to the arrival of HTML chunks. This sequence is typically imperceptible to the user (fractions of a second) but is shown here in slow-motion for illustrative purposes.

Figure 4: Rendering Sequence in Slow Motion

Here is another view, showing Visual Progress (%) as a function of time.

Figure 5: Visual Progress vs. Time

Key Front-End Optimizations

The search results page implements a multitude of performance optimization techniques, many of which are described here. While new optimization techniques are constantly being developed, it’s also important to take care of tried-and-true perf fundamentals.

Optimized Browser Rendering

The page is carefully constructed to deliver fast first render without waiting for the back-end stack. Additionally, because large and complex pages can incur a significant CPU cost in the browser layout engine, reducing time spent on layout calculations improves content rendering performance.

Chunk 1

The contents of Chunk 1 are carefully chosen to deliver a fast first render. Chunk 1 consists of the head tag and a portion of the body tag containing the page header. The head tag contains critical CSS and JS that needs to run early in the page lifecycle. The page header includes elements like the logo, search box and sign-in button. The page header is not query-specific and does not need to wait for the back-end stack (search index etc.) which allows the front-end servers to generate and return this response very quickly via an early flush.

Web Fonts

The page renders using custom Web Fonts. The head contains preload tags that tell the browser to kick off downloads for these fonts immediately. In the absence of these tags the browser would not download these fonts until it had parsed the CSS style declarations and encountered matching elements. Preloading fonts enables them to be available when they are needed for text rendering.

Inline CSS

All critical CSS on the page is served inline in the HTML. Inlining CSS has the benefit of not needing to wait for additional network requests to arrive before the page contents can be styled and painted to the screen. The downside of inlining CSS is that it is not possible to leverage the browser cache to load shared styles on subsequent page loads. The tradeoff makes sense for Bing, but this conclusion is dependent on page construction and usage patterns. It may not make sense for every website.

Content-Visibility

Time spent on layout and style calculations increases on pages with large and complex DOM structures. The content-visibility CSS property can help mitigate this cost by deferring these calculations for elements that are not visible during initial page load. Bing uses content-visibility to reduce layout cost in several different places in both Chunk 1 and Chunk 2. Content-visibility has yielded measurable improvements to rendering performance.

Optimized Network Loading

Optimizing network loading includes minimizing transfer sizes, pipelining, parallelization, and caching.

Compression

Text based resources such as HTML, CSS and JS bundles are served with Brotli compression to minimize the number of bytes over-the-wire.

Connection priming

Connection priming refers to opening a connection prior to it being needed. There are two forms of connection priming on Bing:

For the www.bing.com domain used to fetch the base page, the connection is primed as a side-effect of the search suggestions mechanism that is invoked while a user types a query.
Many of the page’s static resources are loaded from a separate domain (e.g., r.bing.com). These resources are not in the critical path to First Render and are typically discovered and requested later in the page load cycle. Chunk 1 includes a preconnect directive that enables the browser to use the “idle time” prior to Chunk 2 to open the connection prior to it being needed.

Non-blocking JS loading

JavaScript bundles not needed to render content within the initial viewport are loaded in a non-blocking fashion. This ensures that browser rendering is never blocked by pending JS network requests.

Browser caching

Cacheable static resources include a hash of the file contents in the URL. The URL acts as the cache key for browser caches, so this ensures cache consistency if the file contents change. This approach also allows setting long cache expiration times.

Optimized Image Loading

Images are a key visual component of the search results page. Loading images quickly and efficiently is critical to having a fast-loading page.

Shared image sprite

Commonly used icons are served using a shared image sprite. Delivering a single consolidated network request is more efficient than many small individual requests. The shared image sprite is cached for reuse on subsequent visits. Less commonly used icons are not included in the shared sprite and are served inline in the HTML.

Automated image resizing

Given the dynamic nature and long-tail distribution of search queries, it isn’t practical to generate thumbnails for every possible use case of every image in the index ahead of time. Bing leverages an in-house image resizer service that generates properly sized thumbnail images on the fly. The service accepts various input parameters such as height, width, device pixel ratio and quality.

Two-phase image loading

Thumbnail images within the initial viewport are loaded using a two-phase approach. The base HTML contains a lower quality thumbnail embedded in base64 format that enables the browser to quickly paint an initial image to the screen. The low-quality image is then replaced with a high-quality image that is fetched using a separate network request. The two-phase approach provides a good tradeoff between speed and quality. Simply embedding high-quality images would slow the delivery of the HTML down significantly.

Post-loading images outside the viewport

Thumbnails images outside the initial viewport are defer-loaded after the page has rendered. The front-end server skips embedding the low-quality image in the HTML and directly fetches the high-quality image using a deferred network request.

Preview image for maps

Map controls are complex and contain dynamic functionality such as the ability to pan and zoom. It can take a significant amount of time to load and execute the resources needed to render a map control. To mitigate this waiting period, the page loads and renders a preview image of the map location in the position where the map will eventually render. The fully interactive map replaces the preview image when it is done loading.

Staying Fast

Given the natural tendency for web pages (and indeed most software) to get slower over time due to new features and changing code, the performance mission is doomed to failure if there isn’t a strong emphasis on staying fast. Bing employs several practices to stay fast.

Org-wide perf metrics – Bing has institutionalized several key performance metrics and a culture of A/B experimentation to assess the impact of changes before they ship.
Continuous improvement – constant workstreams of investigation, prototyping, and optimization.
Challenging assumptions – conditions are always changing, including both page behavior and external conditions such as the speed of user networks and devices. Key pieces of technology that power the Web – the browser, web standards and network protocols, are changing continuously as well. It is important to challenge assumptions from past performance decisions – what was relevant several years ago may no longer be relevant today.
“Perf Defense” – finally, Bing uses a rigorous set of tools and processes to prevent perf regressions and quickly detect & recover them when they occur. A “budget” also exists to allow small perf regressions under very limited circumstances. Although defense work may be less glamorous, it’s a critical part of the performance mission.

We wish to acknowledge the countless people who have contributed to Bing’s architecture and performance over the years, both front-end and back-end!

- Paul Roy, Amiya Gupta, Mohit Suley

References

[1] Odin: Microsoft’s CDN Measurement System https://www.usenix.org/conference/nsdi18/presentation/calder
[2] LinkedIn wrote an excellent blog post describing these benefits in detail.