Preferences

Two points/questions:

1. Why is tracking access rights "on a per user basis or [...] along with the content" is not feasible? A few mentions: Google Zanzibar (+Ory Keto as OSS impl) - makes authz for content othoronal to apps (i.e. possible to have it in one place, s.t. both Jira and a Jira MCP server can use the same API to check authz - possible to have a 100% faithful authz logic in the MCP server), Eclipse Biscuit (as far as I understand, this is a Dassault's attempt to make JWTs on steroids by adding Datalog and attenuation to the tokens, going in the Zanzibar direction but not requiring a network call for every single check), Apache Accumulo (DBMS with a cell-level security) and others. The way I see it, the tech is there but so far, not enough attention has been put on the problem of a high-fidelity authz throughout the enterprise on a granular level.

2. What is the scale needed? Enterprises with more than 10000 employees are quite rare, many individual internal IT systems even in large companies have less than 100 regular users. At these levels of scale, a lot more approaches are feasible that would not be considered possible at Google scale (i.e. more expensive algorithms w.r.t. big-O are viable).


Because the problem is not "get a list of what user can access" but "the AI that got trained on dataset must not leak to user that doesn't have access to it.

There is no feasible way to track that during training (at least yet), so only current solution would be to learn AI agent only on data use can access and that is costly

Who said it must be done during training? Most of the enterprise data is accessed after training - RAG or MCP tool calls. I can see how the techniques I mentioned above could be applied during RAG (in vector stores adopting Apache Accumulo ideas) or in MCP servers (MCP OAuth + RFC 8693 OAuth 2.0 Token Exchange + Zanzibar/Biscuit for faithfully replicating the authz constraints of systems where the data is being retrieved from).
As I understand it, there's no real way to enforce access rights inside an LLM. If the bot has access to some data, and you have access to the bot, you can potentially trick it into coughing up the data regardless of whether you're supposed to see that info or not.
MCP tools with OAuth support + RFC 8693 OAuth 2.0 Token Exchange (aka OAuth 2.0 On-Behalf-Of flow in Azure Entra - though I don't think MCP 2025-06-18 accounts for the RFC 8693) could be used to limit the MCP bot responses to what the current user is authorized to see.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal