Bing delivers more contextualized search using quantized transformer inference on NVIDIA GPUs in Azure

Bing delivers more contextualized search using quantized transformer inference on NVIDIA GPUs in Azure

Transformer models that power a growing number of intelligent capabilities in Microsoft Bing have significantly increased model complexity over the last couple of years.  To ensure Bing will continue to deliver the fast, responsive, and relevant search experience our users expect, we optimized transformer inference for both latency and throughput using NVIDIA T4 GPUs in NCasT4v3 Azure VMs.  These optimizations have enabled Microsoft Bing...
Read More

RocksDB in Microsoft Bing

RocksDB in Microsoft Bing

The Microsoft Bing platform has built one of the largest distributed storages for Bing web search data, using its home grown ObjectStore service. The system hosts hundreds of petabyte data and processes hundreds of millions lookups per sec. Open source RocksDB is used as the storage engine. Multiple techniques are applied to efficiently store and process the massive data with sub-second data freshness. This blog will present those techniques and...
Read More

Welcome to the engineering blog

Bing and all search and recommendation experiences at Microsoft are powered by infrastructure that runs at extreme scale and speed. The platform team is a world-class engineering team with presence around the world. Our mission is to build platforms that empowers the scale and scenarios for search today.  
Read More