
At Bing, we are always pushing the boundaries of search technology. Leveraging both Large Language Models (LLMs) and Small Language Models (SLMs) marks a significant milestone in enhancing our search capabilities. One of the key challenges with larger models is managing latency and cost. To address this, we have integrated Nvidia TensorRT-LLM technique into our workflow to optimize our SLM inference performance.
Read More