RLS-aware AI tools: securing enterprise data in Copilot Studio
An AI assistant that queries a governed dataset under a service identity is a data leak with good manners. Carrying row-level security all the way through — and why on-behalf-of is the whole game.
Putting an AI assistant in front of a governed dataset is one of the most requested features in enterprise software right now. It is also one of the easiest to get quietly, dangerously wrong. The demo works. The integration leaks. The two facts are not in tension — they're the same fact, seen before and after someone checks who can see what.
The service-identity trap
Here is the integration almost every team builds first. The assistant needs to query a data platform — Power BI, a warehouse, an API. So it's given a service identity: one set of credentials, broad access, easy to wire up. The assistant calls the platform as itself, gets the data, answers the question. It demos beautifully.
It also means every user of the assistant now queries with the assistant's permissions, not their own. The row-level security on that dataset — the rules that say a regional manager sees their region and not the others — is still configured, still correct, and completely bypassed. The assistant has become a polite way to read data you were never cleared for.
On-behalf-of is the whole game
The fix is not a patch on top of the service-identity design. It's a different design. The assistant must call the data platform as the user who asked the question, carrying that user's identity through every hop.
The mechanism is the on-behalf-of token exchange. The assistant receives the user's token, exchanges it for a token scoped to the downstream data platform, and queries with that. Row-level security is now enforced by the platform that owns the data — not by application code, not by hope. Each person sees exactly the rows their role allows, because the query genuinely runs as them.
If your AI tool calls the data platform under a service identity, it is a data leak with good manners.
An agent that can query can run a bad query
Identity is the first gate. The second is the query itself. An assistant that generates and executes queries can be steered — by an ambiguous request, or by deliberately injected instructions — into generating a query you would not have approved.
So the generated query gets validated before it executes: against an allowed shape, a permitted set of operations, sane bounds. The agent proposes; a deterministic check disposes. This is not distrust of the model — it's the same defense in depth you'd apply to any input that reaches a query engine.
Instrument every call
The last requirement is the one that turns a clever feature into something a security team will sign off on: every tool call is logged — who asked, what query ran, what came back, how long it took. Structured telemetry to a real observability platform makes the assistant auditable. When someone asks "what did this thing do last Tuesday," there is an answer.
The short version
Carry the user's identity all the way through. Let the data platform enforce its own row-level security. Validate the generated query before it runs. Log every call. Do those four things and an AI assistant over governed data stops being a risk you're quietly carrying and becomes a capability you can defend.