I have GitHub and Twitter accounts. I let you guess my handles.
- 7 points
- Someone reported it, and I answered today [1]. It's a rule that is too hard on the front end, and we will fix it by using a better Hybrid search setup (not only semantic). Thank you for the report.
[1]: https://github.com/meilisearch/meilisearch/issues/5504#issue...
- V1.14, released yesterday [1], ships with a search embedding cache. Most of the time you see is spent waiting for an OpenAI embedding answer. We also just shipped composite embedders to reduce the network latency when you need to respond quickly to user searches (by running embedders on the Meilisearch server) but still use external APIs to index many documents in batches. Note that it can only work with open-source embedders, the ones HuggingFace serves.
[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
- HuggingFace is using Meilisearch, in production, on their website for a year now.
- You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.
- Right. We released a lot of new versions of the engine to improve the indexing part of it. V1.12 is improving the document indexing a lot! Have you tried the latest version v1.14 we released yesterday?
While Meilisearch is capable of limiting it's resident (actual mallocs) memory. However, it requires a bare minimum (about 1GiB).
- Meilisearch is faster when you reduce the dataset by filtering it. I wrote an article on this subject [1].
[1]: https://blog.kerollmops.com/meilisearch-vs-qdrant-tradeoffs-...
- 35 GiB is probably a third of the data I index into Meilisearch just for experimenting and don't forget about the inverted indexes. You wouldn't use any O(n) algorithm to search in your documents.
Also, every time you need to reboot the engine you would have to reindex everything from scratch. Not a good strategy, believe me.
- > [..] to simplify the setup?
It would be simpler to keep Meilisearch and its key-value store out of Postgres' WAL and stuff and better propose a good SQL exporter (in the plan).
- The best you could do is put Meilisearch on a very good NVMe. I am indexing large streams of content (Bsky posts + likes), and I assure you that I tested Meilisearch on a not-so-good NVMe and a slow HDD — and ho, Boy!! The SSD is so much faster.
I am sending hundreds of thousands of messages and changes (of the likes count) into Meilisearch, and so far, so good. It's been a month, and everything is working fine. We also shipped the new batches/ stats showing a lot of internal information about indexing step timings [1] to help us prioritize.
[1]: https://github.com/meilisearch/meilisearch/pull/5356#issue-2...
- Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].
The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.
That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.
[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
- > I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).
You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].
- Meilisearch decided to use hybrid search and avoid fusion ranking. We plan to work on reranking soon, but as far as I know, our hybrid search is so good that nobody asked for reranking. You can read more about our Hybrid search in our blog post [1].
About streaming ingestion support. Meilisearch support basic HTTP requests and is capable of batching task to index them faster. In v1.12 [2], we released our new indexer version that is much faster, leverages high usage of parallel processing, and reduces disk writes.
[1]: https://www.meilisearch.com/blog/hybrid-search [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
- Meilisearch has been production-ready since v1.0. I made it in Rust to ensure it stays production-ready for years and years. Memory-safe languages are here to replace unsafe ones like C++ and reduce the number of breaches you expose in production.
Here is an article by Google showing the benefits of using memory-safe languages in production rather than others. It is explicitly rotating around Rust [1].
[1]: https://www.chromium.org/Home/chromium-security/memory-safet...
- Thank you very much! We put a lot of effort into our documentation and be ready for the next version of our documentation coming soon. The experience will be even better and faster. We also put much effort recently into simplifying how people can migrate to the next engine version with the [dumpless upgrade feature][1]. We also stabilized our full Rust Vector Store and Hybrid search (AI-powered search) feature in v1.13.
[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
- Nope, it doesn't. It's based on Cascade Ranking, also called [bucket sorting][1]. We released our new Hybrid search ranking system, combining the best full-text search results (our Cascade Ranking) with semantic results (with arroy, our full-Rust Vector Store). You can try that at https://wheretowatch.meilisearch.com.
- 7 points
- Very interesting. I am wondering about the "level of translation". Is it using plain safe Rust or is it full of unsafe functions everywhere.
I planned to port LMDB to Rust by hand "just to see" and it was an awful work. There are too many defines and conditional compilations due to the nature of the work: interfacing different OS to write an on-disk B+Tree...
- I agree that thinking about code and your day-to-day work is not doing Vacations correctly. On the other hand, I hiked and spent time with my family and many dogs. It makes me think more out of the box than I usually do. Also, I like to write and release articles while on vacation.
- Hey, article author, co-founder and tech lead at Meilisearch here. If you have any questions, don't hesitate.
- 4 points
- 5 points
- > My main gripe is how slow to display their API documentation is. I don't know how they managed to make a text only website take 3 or 4 seconds per link.
Fortunately, the Meilisearch documentation website is no longer the slowest website of all times! https://x.com/striftcodes/status/1823637020121440305?s=46&t=...
- Meilisearch only sends anonymized telemetry events. We only send API endpoints usage; nothing like raw documents goes through the wire. You can look at the exhaustive list of all collected data on our website [1].
[1]: https://www.meilisearch.com/docs/learn/what_is_meilisearch/t...
- 3 points
- 9 points
- Indeed, there could probably be cases where you see higher deserialization cost than raw lists of integers. But when it comes to high number of integers I can confirm that it is much more efficient.
- > Interesting. Could you elaborate on the benefit of this?
I don't know on what I can elaborate.
Storing integers that are near each other is much more optimal in a RoaringBitmap than in a flat array. The reason is that it will only store the integers by storing the high part once and the low part in an array or bitmap efficiently.
Also we already use RoaringBitmaps on the other end of Meilisearch and converting that to another data structure could take a lot of times.
That would also be a good fit for [the new delta-encoded posting lists I am working on](https://github.com/meilisearch/meilisearch/pull/5985). Let's see how good it can get. My early benchmarks showed a 50% reduction in disk usage.