Comment by vinni2 - Hacker Neue

vinni2 Jul 31, 2023 parent

Wondering why you didn’t include Elasticsearch [0] in your comparison.

Also having some benchmark to compare performance would help.

[0] https://www.elastic.co/guide/en/elasticsearch/reference/curr...

shreyans Jul 31, 2023

+1, I've been using OpenSearch (basically Elasticsearch 7.0), and have been pretty happy with the setup so far.

OpenSearch specifically has an edge over Elasticsearch because it supports vectors up to 10k dimensions, whereas ES maxes out at indexing 1024 dimensions, which isn't enough to support OpenAI's 1536 dimension vectors.

And then there's the benefit of it being well documented / Q&A'd, and able to support regular searching, faceting, etc. as well.

shreyans Jul 31, 2023

I compared a few options including OS/ES here: https://maven.com/blog/embeddings

vinni2 OP Jul 31, 2023

Also if you want to do hybrid retrieval with legacy system in place elasticsearch is a good option. I would like to see some comparison for the hybrid retrieval as well.

keskival Jul 31, 2023

You aren't supposed to index vectors larger than ~128 dimensions. Because of concentration of measure which is an aspect of the curse of dimensionality the distances between high-dimensional vectors tend to become identical.

You need to do dimensionality reduction before indexing. Basically it's fine to just pick n first components if you don't want anything fancy.

MF-DOOM Jul 31, 2023

You can increase Elasticsearch’s max fields limitation by modifying the index.mapping.total_fields.limit cluster config

readyplayeremma Jul 31, 2023

I can also add one more data-point in favor of Elastic / OpenSearch. They benefit from a long history of providing search-specific features. Including the ability to write custom re-ranking functions to combine the benefits of traditional TF/IDF style search with the modern benefit of vector search techniques. And you can easily use OpenSearch with state of the art open embedding models like SGPT that use 2048-dimensional vectors. Plus, it is designed to be highly scalable and distributed.

Given how well OpenSearch works and scales, I would find it hard to justify a specialized vector-specific database unless it brought A LOT of new benefits to the table. And I am not currently aware how any of them would actually do that.

Also, OpenSearch provides all of that out-of-the-box. You just configure a vector field mapping and start inserting your data. No need for an add-on plugin/extension. It just works.

vinni2 OP Jul 31, 2023

Does OpenSearch support Kibana? Because I haven’t found a good Kibana replacement yet.

huac Jul 31, 2023

You should test yourself on your own use case (eg vector dimension, prefiltering, throughput, target latency). In my testing, using identical HNSW configs between OS and a purpose built vector DB, I saw 10x+ better performance with the vector DB, despite much smaller CPU usage for vector DB and even including internet latency for the vector DB (but not OS).

This may not matter if you are not doing high throughput / have tight latency requirements, but in my case, it did. Of course you should weigh that versus the convenience of preexisting ES/OS clusters and so on. You can also use ES/OS together with a separate vector DB. (these tradeoffs are, of course, what make a static benchmarking post like this one so hard to think about).

andre-z Jul 31, 2023

I guess because ES/OS are text search engines and not vector databases. Some benchmarks: https://qdrant.tech/benchmarks/

simonw Jul 31, 2023

Follow that link - Elastic had vector features now.

I find vector search more convincing as a feature of an existing database than as justification to design an entirely new database - it's basically a new type of index.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous