Profile: Kerollmops - Hacker Neue

Kerollmops

Joined Nov 1, 2016 107 karma

Meilisearch Co-Founder and Tech Lead. 42 School Almuni 2013.

I have GitHub and Twitter accounts. I let you guess my handles.

Kerollmops Dec 21, 2025 parent

So nice! That's an excellent extract and looks useful for benchmarking Meilisearch. I'll probably spend my Christmas holidays importing the tracks, albums, and artists into Meilisearch, while my CEO builds a beautiful front-end for it. I'll probably replace [the current music search demo](https://music.meilisearch.com) we have with this much higher-quality dataset!
That would also be a good fit for [the new delta-encoded posting lists I am working on](https://github.com/meilisearch/meilisearch/pull/5985). Let's see how good it can get. My early benchmarks showed a 50% reduction in disk usage.
7 points Dec 8, 2025

Meilisearch: Speeding up vector search 10x with Hannoy

0 comments Kerollmops kerollmops.com
Kerollmops Apr 15, 2025 parent

Someone reported it, and I answered today [1]. It's a rule that is too hard on the front end, and we will fix it by using a better Hybrid search setup (not only semantic). Thank you for the report.
[1]: https://github.com/meilisearch/meilisearch/issues/5504#issue...
Kerollmops Apr 15, 2025 parent

V1.14, released yesterday [1], ships with a search embedding cache. Most of the time you see is spent waiting for an OpenAI embedding answer. We also just shipped composite embedders to reduce the network latency when you need to respond quickly to user searches (by running embedders on the Meilisearch server) but still use external APIs to index many documents in batches. Note that it can only work with open-source embedders, the ones HuggingFace serves.
[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
Kerollmops Apr 15, 2025 parent

HuggingFace is using Meilisearch, in production, on their website for a year now.
Kerollmops Apr 15, 2025 parent

You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.
Kerollmops Apr 15, 2025 parent

Right. We released a lot of new versions of the engine to improve the indexing part of it. V1.12 is improving the document indexing a lot! Have you tried the latest version v1.14 we released yesterday?
While Meilisearch is capable of limiting it's resident (actual mallocs) memory. However, it requires a bare minimum (about 1GiB).
Kerollmops Apr 15, 2025 parent

Meilisearch is faster when you reduce the dataset by filtering it. I wrote an article on this subject [1].
[1]: https://blog.kerollmops.com/meilisearch-vs-qdrant-tradeoffs-...
Kerollmops Apr 14, 2025 parent

35 GiB is probably a third of the data I index into Meilisearch just for experimenting and don't forget about the inverted indexes. You wouldn't use any O(n) algorithm to search in your documents.
Also, every time you need to reboot the engine you would have to reindex everything from scratch. Not a good strategy, believe me.
Kerollmops Apr 14, 2025 parent

> [..] to simplify the setup?
It would be simpler to keep Meilisearch and its key-value store out of Postgres' WAL and stuff and better propose a good SQL exporter (in the plan).
Kerollmops Apr 14, 2025 parent

The best you could do is put Meilisearch on a very good NVMe. I am indexing large streams of content (Bsky posts + likes), and I assure you that I tested Meilisearch on a not-so-good NVMe and a slow HDD — and ho, Boy!! The SSD is so much faster.
I am sending hundreds of thousands of messages and changes (of the likes count) into Meilisearch, and so far, so good. It's been a month, and everything is working fine. We also shipped the new batches/ stats showing a lot of internal information about indexing step timings [1] to help us prioritize.
[1]: https://github.com/meilisearch/meilisearch/pull/5356#issue-2...
Kerollmops Apr 14, 2025 parent

Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].
The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.
That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.
[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
Kerollmops Apr 14, 2025 parent

> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).
You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].
[1]: https://wheretowatch.meilisearch.com/
Kerollmops Apr 14, 2025 parent

Meilisearch decided to use hybrid search and avoid fusion ranking. We plan to work on reranking soon, but as far as I know, our hybrid search is so good that nobody asked for reranking. You can read more about our Hybrid search in our blog post [1].
About streaming ingestion support. Meilisearch support basic HTTP requests and is capable of batching task to index them faster. In v1.12 [2], we released our new indexer version that is much faster, leverages high usage of parallel processing, and reduces disk writes.
[1]: https://www.meilisearch.com/blog/hybrid-search [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
Kerollmops Apr 14, 2025 parent

Meilisearch has been production-ready since v1.0. I made it in Rust to ensure it stays production-ready for years and years. Memory-safe languages are here to replace unsafe ones like C++ and reduce the number of breaches you expose in production.
Here is an article by Google showing the benefits of using memory-safe languages in production rather than others. It is explicitly rotating around Rust [1].
[1]: https://www.chromium.org/Home/chromium-security/memory-safet...
Kerollmops Mar 7, 2025 parent

Thank you very much! We put a lot of effort into our documentation and be ready for the next version of our documentation coming soon. The experience will be even better and faster. We also put much effort recently into simplifying how people can migrate to the next engine version with the [dumpless upgrade feature][1]. We also stabilized our full Rust Vector Store and Hybrid search (AI-powered search) feature in v1.13.
[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...
Kerollmops Mar 7, 2025 parent

Nope, it doesn't. It's based on Cascade Ranking, also called [bucket sorting][1]. We released our new Hybrid search ranking system, combining the best full-text search results (our Cascade Ranking) with semantic results (with arroy, our full-Rust Vector Store). You can try that at https://wheretowatch.meilisearch.com.
[1]: https://en.wikipedia.org/wiki/Bucket_sort
7 points Jan 20, 2025

Vector Search at Scale: How open-source vector DBs compare

1 comment Kerollmops kerollmops.com
Kerollmops Dec 31, 2024 parent

Very interesting. I am wondering about the "level of translation". Is it using plain safe Rust or is it full of unsafe functions everywhere.
I planned to port LMDB to Rust by hand "just to see" and it was an awful work. There are too many defines and conditional compilations due to the nature of the work: interfacing different OS to write an on-disk B+Tree...
Kerollmops Dec 5, 2024 parent

I agree that thinking about code and your day-to-day work is not doing Vacations correctly. On the other hand, I hiked and spent time with my family and many dogs. It makes me think more out of the box than I usually do. Also, I like to write and release articles while on vacation.
Kerollmops Dec 5, 2024 parent

Hey, article author, co-founder and tech lead at Meilisearch here. If you have any questions, don't hesitate.
4 points Nov 29, 2024

Meilisearch Indexes Embeddings 7x Faster with Binary Quantization

0 comments Kerollmops kerollmops.com
5 points Aug 20, 2024

Meilisearch Is Too Slow

1 comment Kerollmops github.com
Kerollmops Aug 15, 2024 parent

> My main gripe is how slow to display their API documentation is. I don't know how they managed to make a text only website take 3 or 4 seconds per link.
Fortunately, the Meilisearch documentation website is no longer the slowest website of all times! https://x.com/striftcodes/status/1823637020121440305?s=46&t=...
Kerollmops May 28, 2024 parent

Meilisearch only sends anonymized telemetry events. We only send API endpoints usage; nothing like raw documents goes through the wire. You can look at the exhaustive list of all collected data on our website [1].
[1]: https://www.meilisearch.com/docs/learn/what_is_meilisearch/t...
3 points Apr 26, 2024

Heed v0.20: Safest and most maintained Rust wrapper for the LMDB key-value store

0 comments Kerollmops github.com
9 points Mar 26, 2024

Meilisearch Updates a Millions Vector Embeddings Database in Under a Minute

0 comments Kerollmops kerollmops.com
Kerollmops Dec 25, 2023 parent

Indeed, there could probably be cases where you see higher deserialization cost than raw lists of integers. But when it comes to high number of integers I can confirm that it is much more efficient.
Kerollmops Dec 25, 2023 parent

> Interesting. Could you elaborate on the benefit of this?
I don't know on what I can elaborate.
Storing integers that are near each other is much more optimal in a RoaringBitmap than in a flat array. The reason is that it will only store the integers by storing the high part once and the low part in an array or bitmap efficiently.
Also we already use RoaringBitmaps on the other end of Meilisearch and converting that to another data structure could take a lot of times.

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous