Evolving role of the index: From ranking pages to supporting answers

Same Foundations. Different Optimization Problems.

Introduction 

For decades, search engines have relied on large–scale indexing systems to help people discover information on the web. The infrastructure behind this – crawling billions of pages, evaluating content quality, ranking results by relevance – became the backbone of how people navigate the internet. This model has worked extraordinarily well, and it still does. 

But AI systems (for example, AI agents, companions, and generative AI answers embedded in search and apps) don't navigate the internet the way humans do. And that changes the indexing problem in fundamental ways. 

Two Systems, Two Responsibilities 

Traditional search and grounding systems share the same foundation – crawling, understanding, and ranking the web – but they are optimized for fundamentally different outcomes. 

Traditional search asks: which pages should a user visit? Grounding asks: what information can an AI system responsibly use to construct a response? These questions sound similar. They are not. The table below provides a breakdown of key considerations: 

Dimension 

Traditional search 

Grounding for AI responses 

Primary question 

Which pages should a user visit? 

What information can an AI system responsibly use to construct an answer? 

Unit of value 

The document (page) 

Groundable information (discrete, supportable facts with clear provenance) 

Role of the user 

Human evaluates results and self-corrects 

User sees a synthesized answer; independent verification requires checking the cited sources. 

Error dynamics 

Imperfect ranking is tolerable; recovery is easy 

Errors can compound across reasoning steps 

Valid outcomes 

Return ranked options 

Answer when supported; abstain when evidence is insufficient 

Accountability 

Surface relevant options 

Provide high-quality evidence that can support a committed answer 

What Traditional Search Indexing Optimizes For

Traditional indexing answers a simple question: which pages should a user visit? The goal is recall and breadth – surface as many relevant options as possible and let the user choose. The unit of value is the document: a page that ranks well, that a human can skim, evaluate, and act on. 

That simplicity is a feature. Search was designed for humans who can scan a results page, skip the results that don’t fit, and course–correct in real time. The index does not need to be right about every result – it needs to be right enough that the user finds what they are looking for. 

What Changes When the Goal Is Grounding AI Answers 

Grounding an AI–generated answer introduces a fundamentally different constraint: the system is no longer just pointing to information, it is using it. The goal shifts from “fetch the best documents” to “fetch the best information to synthesize into a reliable, verifiable answer.” 

Instead of just ranking pages, the index must help an AI system determine which specific information can responsibly support an answer. The unit of value shifts from documents to groundable information – discrete, supportable facts with clear provenance. When an AI system presents an answer, multiple sources might collapse into a single statement and errors can compound across reasoning steps. Grounding practices therefore emphasize source identification so users can validate claims or explore further when needed. Grounding consequently emphasizes high-quality source identification and attribution so users (and downstream systems) can verify what was used and follow the evidence when needed. In this setting, the system must decide not only what to answer, but whether the evidence is sufficient to answer at all. Abstention is a valid outcome when support is missing, stale, or conflicting – it reflects a deliberate judgment about what the available evidence can justify.

What the Index Must Measure Differently 

This is where the two systems diverge most concretely. The metrics a search optimizes for are not the same metrics grounding needs to track – and closing that gap requires rethinking what “index quality” means from the ground up. 

What to measure 

In traditional search 

In grounding 

Factual fidelity 

Ranking can tolerate some mismatch; the user can click through and interpret 

Critical: chunking/transformations must preserve meaning and claims used in the answer 

Source attribution quality 

Attribution is helpful, but users
choose what to trust 

Core signal: evidence needs clear provenance and varying evidentiary weight 

Freshness 

Stale content mainly degrades 
ranking usefulness 

Stale facts can directly produce wrong answers 

Coverage of high-value facts 

Coverage is broad; missing a
document is often
recoverable via alternative results 

Must ensure facts and sources people ask about are actually retrievable and groundable 

Contradictions / conflict 

Can surface one source above
another and let the user arbitrate 

Must detect and represent conflict; silent arbitration risks confident wrong answers 

Traditional search quality is largely measured through user behavior and ranking performance. The index asks: is the most relevant content being surfaced at the top? Are users finding what they need? Is content fresh enough to be useful for ranking? Are near–duplicate pages being collapsed efficiently? These measures all assume a human in the loop who can scan, skip, and self–correct. A stale result is a ranking problem. A missed document is a coverage gap. Both matter – but neither is catastrophic, because the user can recover. 

Grounding changes what the index needs to account for, in ways that are both more demanding and harder to measure. Factual fidelity becomes critical: does the indexed representation of a page accurately preserve the meaning of the original content? The processes of breaking content into retrievable chunks and transforming it for fast lookup can distort page substance in ways that never appear in any ranking signal. Source attribution quality matters in an entirely new way – not all indexed content carries equal evidentiary weight for an AI answer, and the index needs to understand that distinction. 

Freshness failure carries a categorically different cost. In search, stale content degrades ranking. In grounding, a stale fact produces a misleading response. The index must also account for coverage gaps in high–value content – not just whether the web is broadly indexed, but whether the specific facts and sources that people are likely to ask about are actually available and groundable. And when two indexed sources contradict each other, a grounding index cannot simply surface one above the other and move on. It needs to register that conflict, because an AI system that silently arbitrates between contradictory sources is one that may confidently assert the wrong thing. 

The shift in what gets measured reflects a deeper shift in what the index is responsible for. A search function is accountable for surfacing options. A grounding function is accountable for the quality of evidence it provides to an AI system that will commit to an answer which the users may subsequently verify.

Grounding Builds on Search 

A common misconception is that grounding replaces search. It does not. Grounding builds on the same foundational infrastructure – the same crawlers, the same quality signals, the same deep understanding of the web – but it adds a new optimization layer on top. 

Grounding is about determining what information can responsibly support a claim and having the discipline to withhold when the evidence is not there. The infrastructure is shared. The purpose is different. 

Retrieval Becomes a System, Not a Step 

Traditional search is typically a single interaction: query in, ranked results out. The simplicity of that model is a feature – it is fast, predictable, and easy to reason about. 

Grounding operates in loops. A system grounding an AI answer may need to ask follow–up questions, refine retrieval based on intermediate results, combine evidence from multiple sources, and re–evaluate when confidence is low. This changes the error profile of the index entirely – in particular the retrieval systems. If early retrieval steps introduce subtle errors, those errors compound through subsequent reasoning steps in ways that no human reviewer would catch in real time. Grounding systems cannot rely on the safety net that search provides – where a user scans results, skips irrelevant hits, and course–corrects on the fly. Retrieval systems must therefore optimize not just for one–shot retrieval, but for consistent, repeatable behavior across iterative use. 

The Bigger Picture 

Indexing for grounded AI answers is not a reinvention of search – it is a major evolution of it. Grounding commits to an answer. This is not a surface–level evolution. 

The shift we described at the opening is worth restating plainly: search indexing was built to help humans decide what to read. Grounding indexing is being built to help AI systems decide what to say. The infrastructure required to do those two things well is not the same – even when it starts from the same foundation. 

What makes this hard is not the technology gap – it is the measurement gap. We have decades of practice measuring search quality. We are still learning what it means to measure grounding quality rigorously: not just whether an answer was retrieved, but whether the evidence behind it was accurate, fresh, attributable, and consistent. 

Search optimizes for likelihood of relevance. Grounding must measure strength of evidence. Understanding that difference is not just an engineering concern. It is the starting point for building AI systems that people can actually trust. For a practical perspective on what this shift means for content creators, see our blog post from November 2025 on Optimizing content for inclusion in the era of AI, which outlines concrete steps to make information easier to interpret, cite, and verify in AI experiences. Bing Webmaster Tools can complement that guidance helping you use real performance data to refine what you publish and how you structure it for AI-driven discovery.

Krishna Madhavan, Knut Risvik, Meenaz Merchant
Microsoft AI