Preferences

I believe most vector databases allow you to annotate vectors with additional metadata. Why not simply add as metadata the list of principals (roles/groups) who have access to the information (e.g. HR, executives) ? Then when a user makes a request to the chatbot, you expand the user identity to his/her principals (e.g. HR) and use those as implicit filtering criteria for finding the closest vectors in the database.

In this way you exclude up-front the documents that the current user cannot see.

Of course, this requires you to update the vector metadata any time the permissions change at the document level (e.g. a given document originally visible only to HR is now also visibile to executives -> you need to add the principal executives to the metadata of the vector resulting from the document in your vector database)


This is the correct answer. You do a pre-filter on a permissions correlated field like this and post-filter on the results for the deeper perms checks.
I am in control of the vector database and the search index. I have no control over the different accessed data sources that don’t even allow to query access rights per resource (and just allow for can_access checks for a given user)

This item has no comments currently.