Same Foundations. Different Optimization Problems.
Introduction
For decades, search engines have relied on large–scale indexing systems to help people discover information on the web. The infrastructure behind this – crawling billions of pages, evaluating content quality, ranking results by relevance – became the backbone of how people navigate the internet. This model has worked extraordinarily well, and it still does.
But AI systems (for example, AI agents, companions, and generative AI answers embedded in search and apps) don't navigate the internet the way humans do. And that changes the indexing problem in fundamental ways.
Two Systems, Two Responsibilities
Traditional search and grounding systems share the same foundation – crawling, understanding, and ranking the web – but they are optimized for fundamentally different outcomes.
Traditional search asks: which pages should a user visit? Grounding asks: what information can an AI system responsibly use to construct a response? These questions sound similar. They are not. The table below provides a breakdown of key considerations:
|
Dimension |
Traditional search |
Grounding for AI responses |
|
Primary question |
Which pages should a user visit? |
What information can an AI system responsibly use to construct an answer? |
|
Unit of value |
The document (page) |
Groundable information (discrete, supportable facts with clear provenance) |
|
Role of the user |
Human evaluates results and self-corrects |
User sees a synthesized answer; independent verification requires checking the cited sources. |
|
Error dynamics |
Imperfect ranking is tolerable; recovery is easy |
Errors can compound across reasoning steps |
|
Valid outcomes |
Return ranked options |
Answer when supported; abstain when evidence is insufficient |
|
Accountability |
Surface relevant options |
Provide high-quality evidence that can support a committed answer |
What Traditional Search Indexing Optimizes For
Traditional indexing answers a simple question: which pages should a user visit? The goal is recall and breadth – surface as many relevant options as possible and let the user choose. The unit of value is the document: a page that ranks well, that a human can skim, evaluate, and act on.
That simplicity is a feature. Search was designed for humans who can scan a results page, skip the results that don’t fit, and course–correct in real time. The index does not need to be right about every result – it needs to be right enough that the user finds what they are looking for.
What Changes When the Goal Is Grounding AI Answers
Grounding an AI–generated answer introduces a fundamentally different constraint: the system is no longer just pointing to information, it is using it. The goal shifts from “fetch the best documents” to “fetch the best information to synthesize into a reliable, verifiable answer.”
Instead of just ranking pages, the index must help an AI system determine which specific information can responsibly support an answer. The unit of value shifts from documents to groundable information – discrete, supportable facts with clear provenance. When an AI system presents an answer, multiple sources might collapse into a single statement and errors can compound across reasoning steps. Grounding practices therefore emphasize source identification so users can validate claims or explore further when needed. Grounding consequently emphasizes high-quality source identification and attribution so users (and downstream systems) can verify what was used and follow the evidence when needed. In this setting, the system must decide not only what to answer, but whether the evidence is sufficient to answer at all. Abstention is a valid outcome when support is missing, stale, or conflicting – it reflects a deliberate judgment about what the available evidence can justify.
What the Index Must Measure Differently
This is where the two systems diverge most concretely. The metrics a search optimizes for are not the same metrics grounding needs to track – and closing that gap requires rethinking what “index quality” means from the ground up.
|
What to measure |
In traditional search |
In grounding |
|
Factual fidelity |
Ranking can tolerate some mismatch; the user can click through and interpret |
Critical: chunking/transformations must preserve meaning and claims used in the answer |
|
Source attribution quality |
Attribution is helpful, but users |
Core signal: evidence needs clear provenance and varying evidentiary weight |
|
Freshness |
Stale content mainly degrades |
Stale facts can directly produce wrong answers |
|
Coverage of high-value facts |
Coverage is broad; missing a |
Must ensure facts and sources people ask about are actually retrievable and groundable |
|
Contradictions / conflict |
Can surface one source above |
Must detect and represent conflict; silent arbitration risks confident wrong answers |
Traditional search quality is largely measured through user behavior and ranking performance. The index asks: is the most relevant content being surfaced at the top? Are users finding what they need? Is content fresh enough to be useful for ranking? Are near–duplicate pages being collapsed efficiently? These measures all assume a human in the loop who can scan, skip, and self–correct. A stale result is a ranking problem. A missed document is a coverage gap. Both matter – but neither is catastrophic, because the user can recover.
Grounding changes what the index needs to account for, in ways that are both more demanding and harder to measure. Factual fidelity becomes critical: does the indexed representation of a page accurately preserve the meaning of the original content? The processes of breaking content into retrievable chunks and transforming it for fast lookup can distort page substance in ways that never appear in any ranking signal. Source attribution quality matters in an entirely new way – not all indexed content carries equal evidentiary weight for an AI answer, and the index needs to understand that distinction.
Freshness failure carries a categorically different cost. In search, stale content degrades ranking. In grounding, a stale fact produces a misleading response. The index must also account for coverage gaps in high–value content – not just whether the web is broadly indexed, but whether the specific facts and sources that people are likely to ask about are actually available and groundable. And when two indexed sources contradict each other, a grounding index cannot simply surface one above the other and move on. It needs to register that conflict, because an AI system that silently arbitrates between contradictory sources is one that may confidently assert the wrong thing.
The shift in what gets measured reflects a deeper shift in what the index is responsible for. A search function is accountable for surfacing options. A grounding function is accountable for the quality of evidence it provides to an AI system that will commit to an answer which the users may subsequently verify.
Grounding Builds on Search
A common misconception is that grounding replaces search. It does not. Grounding builds on the same foundational infrastructure – the same crawlers, the same quality signals, the same deep understanding of the web – but it adds a new optimization layer on top.
Grounding is about determining what information can responsibly support a claim and having the discipline to withhold when the evidence is not there. The infrastructure is shared. The purpose is different.
Retrieval Becomes a System, Not a Step
Traditional search is typically a single interaction: query in, ranked results out. The simplicity of that model is a feature – it is fast, predictable, and easy to reason about.
Grounding operates in loops. A system grounding an AI answer may need to ask follow–up questions, refine retrieval based on intermediate results, combine evidence from multiple sources, and re–evaluate when confidence is low. This changes the error profile of the index entirely – in particular the retrieval systems. If early retrieval steps introduce subtle errors, those errors compound through subsequent reasoning steps in ways that no human reviewer would catch in real time. Grounding systems cannot rely on the safety net that search provides – where a user scans results, skips irrelevant hits, and course–corrects on the fly. Retrieval systems must therefore optimize not just for one–shot retrieval, but for consistent, repeatable behavior across iterative use.
The Bigger Picture
Indexing for grounded AI answers is not a reinvention of search – it is a major evolution of it. Grounding commits to an answer. This is not a surface–level evolution.
The shift we described at the opening is worth restating plainly: search indexing was built to help humans decide what to read. Grounding indexing is being built to help AI systems decide what to say. The infrastructure required to do those two things well is not the same – even when it starts from the same foundation.
What makes this hard is not the technology gap – it is the measurement gap. We have decades of practice measuring search quality. We are still learning what it means to measure grounding quality rigorously: not just whether an answer was retrieved, but whether the evidence behind it was accurate, fresh, attributable, and consistent.
Search optimizes for likelihood of relevance. Grounding must measure strength of evidence. Understanding that difference is not just an engineering concern. It is the starting point for building AI systems that people can actually trust. For a practical perspective on what this shift means for content creators, see our blog post from November 2025 on Optimizing content for inclusion in the era of AI, which outlines concrete steps to make information easier to interpret, cite, and verify in AI experiences. Bing Webmaster Tools can complement that guidance helping you use real performance data to refine what you publish and how you structure it for AI-driven discovery.
Krishna Madhavan, Knut Risvik, Meenaz Merchant
Microsoft AI
