Embeddings are what encode the “meaning” of a given text. Similarity search works by computing the angle between your query vector and the rest of the vectors already stored.
DuckDB (and columnar stores in general) is great at aggregation. It’s particularly well suited because DuckDB is a single file. There’s no server to muck with.
I've got a crapload of json q & a formatted discussions on a topic, and am trying to figure out if I just store it somewhere and query it, or do I also do vector embeddings, kinda lost with all the possible options.