
Transformer models that power a growing number of intelligent capabilities in Microsoft Bing have significantly increased model complexity over the last couple of years. To ensure Bing will continue to deliver the fast, responsive, and relevant search experience our users expect, we optimized transformer inference for both latency and throughput using NVIDIA T4 GPUs in NCasT4v3 Azure VMs. These optimizations have enabled Microsoft Bing...
Read More