Preferences

Hey, this looks great! I'm a huge fan of vectors in Postgres or wherever your data lives, and this seems like a great abstraction.

When I write a sql query that includes a vector search and some piece of logic, like: ``` select name from users where age > 21 order by <vector_similarity(users.bio, "I like long walks on the beach")> limit 10; ``` Does it filter by age first or second? I've liked the DX of pg_vector, but they do vector search, followed by filtering. It seems like that slows down what should be the superpower of a setup like this.

Here's a bit more of a complicated example of what I'm talking about: https://blog.bawolf.com/p/embeddings-are-a-good-starting-poi...


(post co-author here)

It could do either depending on on what the planner decides. In pgvector it usually does post-filtering in practice (filter after vector search).

pgvector HNSW has the problem that there is a cutoff of retrieving some constant C results and if none of them match the filter than it won't find results. I believe newer version of pgvector address that. Also pgvectorscale's StreamingDiskANN[1] doesn't have that problem to begin with.

[1]: https://www.timescale.com/blog/how-we-made-postgresql-as-fas...

pg_vector does post-filtering, not pre-filtering
timescaledbs pg_vector_scale extension does pre-filtering thankfully. shame i cant get it in RDS though
You can request it for RDS

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal