Insights / Grounding Copilot Studio on your docs: three places vector indexing silently fails

Grounding Copilot Studio on your docs: three places vector indexing silently fails

Pointing an agent at a vector index looks like configuration work. It is design work - and the defaults are wrong for documentation. Three failure modes that produce a finished-looking index and an agent that finds nothing.

Published

June 2026

Length

7 min read

Topics

AI Orchestration · RAG · Azure

Retrieval-augmented generation has a demo problem: it looks done long before it works. You stand up an Azure AI Search index, point Copilot Studio at it, ask a question from the documentation, and get a plausible answer. Then a real user asks a real question and the agent shrugs. The index is green. The connector is green. Nothing is wrong - except retrieval.

The corpus that taught me this was an eleven-file markdown knowledge pack for an analytics agent: a business glossary, per-measure definitions grouped under display-folder headings, source-system descriptions, scenario guides. Well-structured content, already atomized into one concept per section — which is exactly the kind of corpus the default tooling quietly ruins. Grounding the agent on it failed in three specific places, and each one failed silently.

The wizard chunks by size, not meaning

The portal's import-and-vectorize wizard is the natural first move, and it is the first mistake. Its chunking is built in and nonconfigurable, and the docs now print the effective settings outright:

"textSplitMode": "pages",
"maximumPageLength": 2000,
"pageOverlapLength": 500,
"unit": "characters"

A fixed two-thousand-character window with a quarter overlap, no regard for where one concept ends and the next begins. To be fair to the default: for narrative prose it's a reasonable start, and Microsoft's own chunking guidance recommends roughly 512 tokens (~2,000 characters) with about 25% overlap as the opening position. But documentation isn't narrative prose. For a measure glossary, the fixed window welded ten and more short metric definitions into single blobs and sliced the long system-description docs mid-section. A question about one metric retrieved a chunk containing a dozen, and the model paraphrased whichever it liked.

Documentation has structure; the chunking has to follow it. Azure AI Search's blob indexer has a markdown parsing mode that does this natively — no skillset gymnastics required:

"parameters": {
  "configuration": {
    "parsingMode": "markdown",
    "markdownParsingSubmode": "oneToMany",
    "markdownHeaderDepth": "h3"
  }
}

oneToMany makes one search document per content section instead of one per file. The header-depth setting is the design decision: in this corpus # was the domain, ## the display folder or glossary term, ### the individual measure — so depth h3 produced exactly one document per measure, with the parser lifting the heading path into sections/h1, sections/h2, sections/h3 fields. Files with no ### headings (the glossary, the scenario guide) simply split at ## — one document per term. The same one-line setting gave every file in the corpus its natural granularity, and promoting h1 to a filterable domain field came free.

The retriever can only be as precise as the boundaries you give it.

A chunk that doesn't know its own name

Splitting on headings introduces the second trap, and it's a direct consequence of how the parser works: the heading is not part of the body. Markdown parsing puts the section text in content and the heading path in separate sections fields — so the chunk describing Average Engagement Score may never contain the words "Average Engagement Score." Embed only content and the measure's name isn't in its own vector. Name-based questions — which is most of them, for a glossary — match poorly or not at all.

The fix is mechanical once you see it: feed the embedding skill a concatenation of the heading path and the body — h1 + h2 + h3 + content — so every chunk carries its own name and its place in the hierarchy into vector space. Microsoft's chunking guidance gestures at the same idea, recommending you append document titles to chunks to prevent context loss. For content whose whole job is to define named things, it isn't optional polish; it's the difference between a glossary that answers and one that doesn't.

The vectorizer that wasn't there

The quietest failure: an index can hold perfectly embedded documents and still have no way to embed the question. In Azure AI Search, query-time text-to-vector conversion is a separate piece of index configuration — a vectorizer, attached to the vector field through a vector profile — and nothing warns you it's missing. Copilot Studio sends plain-text queries and relies on the index vectorizing the incoming prompt at runtime with the same embedding model used at indexing time; no vectorizer, no vector retrieval, and the agent degrades into confident vagueness built on whatever keyword matches survive.

There's an irony here: the wizard I just told you not to use for chunking wires up a vectorizer automatically, matched to your embedding deployment by design. Build your own indexer pipeline for better chunks — as this project did — and the query side becomes yours to remember. Documents go in through your pipeline; questions arrive with no pipeline at all.

Two rules keep it honest. The vectorizer must point at the same embedding model that encoded your content — same model and dimensions (this corpus used a 3,072-dimension text-embedding-3-large deployment; a mismatch fails, at best, loudly). And before you judge retrieval quality at all, open the index JSON and look for the vectorizers array.

An index that looks finished and a retriever that finds nothing are the same artifact - seen before and after the first real question.

If retrieval seems weirdly useless, check the vectorizer before you re-chunk anything.

The failure I met later: chunks that outlive their sections

A fourth failure mode surfaced after the first re-index, so it earns its place here. In oneToMany mode, each section is its own search document — and when you edit a file and re-run the indexer, documents for deleted sections are not removed. The indexer overwrites what still exists and leaves the rest. Reorganize a glossary — merge two measures, rename a heading — and the old chunks keep answering questions alongside the new ones, which is its own species of silent wrong. The documented workarounds are a soft-delete pass on the blob before re-indexing, or explicitly deleting the file's documents first. Glossaries get reorganized constantly; build the purge step into the pipeline on day one.

Facts in the index, behavior in the prompt

The last lesson is architectural. It is tempting to push everything into the index - definitions, query-writing guides, formatting rules. This corpus included a procedural file: how to construct DAX queries, patterns to follow, result-formatting rules. It never went into the index. Retrieval is for facts the agent looks up; instructions are for behavior the agent always has. Procedural knowledge retrieved sometimes is behavior the agent exhibits sometimes — and every procedural chunk that does get retrieved competes with an actual fact for a retrieval slot. Keep how-to-act in the system prompt and what-is-true in the index, and both get stronger.

One constraint makes this split easier to respect than to violate: a Copilot Studio knowledge connection points at exactly one Azure AI Search index. You don't get an index per document type, so design the one index deliberately — one container, one indexer, heading path as fields, domain as a filter — and keep behavior out of it entirely. While you're in the schema, add a URL field per chunk: Copilot Studio treats metadata_storage_path (or any full-URL field) as the citation, and an agent that can cite which glossary entry it used is dramatically easier to debug than one that can't.

The short version

Chunk on meaning, not size - the wizard's fixed 2,000-character window is wrong for structured docs, and markdown parsing mode with the right header depth is a one-line fix. Embed each chunk's name with its body, because the parser strips headings out of content. Confirm the vectorizer exists and matches your embedding model before judging retrieval quality. Purge before you re-index, or deleted sections keep answering. And split facts from behavior - index the first, prompt the second. None of these failures announce themselves; all of them are five-minute fixes once named.

All insights Start a project