Preferences

+1, I've been using OpenSearch (basically Elasticsearch 7.0), and have been pretty happy with the setup so far.

OpenSearch specifically has an edge over Elasticsearch because it supports vectors up to 10k dimensions, whereas ES maxes out at indexing 1024 dimensions, which isn't enough to support OpenAI's 1536 dimension vectors.

And then there's the benefit of it being well documented / Q&A'd, and able to support regular searching, faceting, etc. as well.


I compared a few options including OS/ES here: https://maven.com/blog/embeddings
Also if you want to do hybrid retrieval with legacy system in place elasticsearch is a good option. I would like to see some comparison for the hybrid retrieval as well.
You aren't supposed to index vectors larger than ~128 dimensions. Because of concentration of measure which is an aspect of the curse of dimensionality the distances between high-dimensional vectors tend to become identical.

You need to do dimensionality reduction before indexing. Basically it's fine to just pick n first components if you don't want anything fancy.

You can increase Elasticsearch’s max fields limitation by modifying the index.mapping.total_fields.limit cluster config

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal