Preferences

Wondering why you didn’t include Elasticsearch [0] in your comparison.

Also having some benchmark to compare performance would help.

[0] https://www.elastic.co/guide/en/elasticsearch/reference/curr...


+1, I've been using OpenSearch (basically Elasticsearch 7.0), and have been pretty happy with the setup so far.

OpenSearch specifically has an edge over Elasticsearch because it supports vectors up to 10k dimensions, whereas ES maxes out at indexing 1024 dimensions, which isn't enough to support OpenAI's 1536 dimension vectors.

And then there's the benefit of it being well documented / Q&A'd, and able to support regular searching, faceting, etc. as well.

I compared a few options including OS/ES here: https://maven.com/blog/embeddings
Also if you want to do hybrid retrieval with legacy system in place elasticsearch is a good option. I would like to see some comparison for the hybrid retrieval as well.
You aren't supposed to index vectors larger than ~128 dimensions. Because of concentration of measure which is an aspect of the curse of dimensionality the distances between high-dimensional vectors tend to become identical.

You need to do dimensionality reduction before indexing. Basically it's fine to just pick n first components if you don't want anything fancy.

You can increase Elasticsearch’s max fields limitation by modifying the index.mapping.total_fields.limit cluster config
I can also add one more data-point in favor of Elastic / OpenSearch. They benefit from a long history of providing search-specific features. Including the ability to write custom re-ranking functions to combine the benefits of traditional TF/IDF style search with the modern benefit of vector search techniques. And you can easily use OpenSearch with state of the art open embedding models like SGPT that use 2048-dimensional vectors. Plus, it is designed to be highly scalable and distributed.

Given how well OpenSearch works and scales, I would find it hard to justify a specialized vector-specific database unless it brought A LOT of new benefits to the table. And I am not currently aware how any of them would actually do that.

Also, OpenSearch provides all of that out-of-the-box. You just configure a vector field mapping and start inserting your data. No need for an add-on plugin/extension. It just works.

Does OpenSearch support Kibana? Because I haven’t found a good Kibana replacement yet.
You should test yourself on your own use case (eg vector dimension, prefiltering, throughput, target latency). In my testing, using identical HNSW configs between OS and a purpose built vector DB, I saw 10x+ better performance with the vector DB, despite much smaller CPU usage for vector DB and even including internet latency for the vector DB (but not OS).

This may not matter if you are not doing high throughput / have tight latency requirements, but in my case, it did. Of course you should weigh that versus the convenience of preexisting ES/OS clusters and so on. You can also use ES/OS together with a separate vector DB. (these tradeoffs are, of course, what make a static benchmarking post like this one so hard to think about).

I guess because ES/OS are text search engines and not vector databases. Some benchmarks: https://qdrant.tech/benchmarks/
Follow that link - Elastic had vector features now.

I find vector search more convincing as a feature of an existing database than as justification to design an entirely new database - it's basically a new type of index.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal