Comment by giamma - Hacker Neue

giamma Aug 20, 2025 parent

I believe most vector databases allow you to annotate vectors with additional metadata. Why not simply add as metadata the list of principals (roles/groups) who have access to the information (e.g. HR, executives) ? Then when a user makes a request to the chatbot, you expand the user identity to his/her principals (e.g. HR) and use those as implicit filtering criteria for finding the closest vectors in the database.

In this way you exclude up-front the documents that the current user cannot see.

Of course, this requires you to update the vector metadata any time the permissions change at the document level (e.g. a given document originally visible only to HR is now also visibile to executives -> you need to add the principal executives to the metadata of the vector resulting from the document in your vector database)

sporkland Aug 20, 2025

This is the correct answer. You do a pre-filter on a permissions correlated field like this and post-filter on the results for the deeper perms checks.

planb Aug 20, 2025

I am in control of the vector database and the search index. I have no control over the different accessed data sources that don’t even allow to query access rights per resource (and just allow for can_access checks for a given user)

This item has no comments currently.