Profile: philippemnoel

philippemnoel

Joined Feb 19, 2019 831 karma

Find me at https://linktree.com/philippemnoel

Jan 3, 2026

ParadeDB (YC S23) Is Hiring Database Engineers

philippemnoel notion.site
philippemnoel Dec 12, 2025 parent

That's true. For this reason, most modern search engines support language-aware stemming and tokenization. Popular tokenizers for CJK languages include Lindera and Jieba.
We (ParadeDB) use a search library called Tantivy under the hood, which supports stemming in Finnish, Danish and many other languages: https://docs.paradedb.com/documentation/token-filters/stemmi...
134 points Dec 11, 2025

From text to token: How tokenization pipelines work

21 comments philippemnoel paradedb.com
philippemnoel Dec 1, 2025 parent

ParadeDB | https://paradedb.com | SF Onsite + Remote | Full-Time | Rust Database Engineers
ParadeDB is an alternative to Elasticsearch built on Postgres. We're building a Postgres extension in Rust that offers a new index type optimized for full-text search and aggregate/analytics workloads. We solve three problems with Elasticsearch today:
- Lack of read-after-writes guarantees
- Lack of JOINs
- Infrastructure complexity & cost due to syncing Postgres and Elastic.
We're open-source, and our repository is available at https://github.com/paradedb/paradedb. We're a Series A team of 8 distributed across the US and Canada. Most folks on our team have 10+ years of experience in database internals at companies like Twitter, MongoDB, Oracle, Instacart, etc.
You can find our roles and the profiles of our team members here: https://paradedb.notion.site
If you know Rust and/or have experience working on DB internals and want to work on cool systems problems, shoot us a note. We hire with conviction, have lots of room to grow, and exciting technical problems to solve. My email is phil@paradedb.com.
Sep 26, 2025

ParadeDB (YC S23) Is Hiring Database Internals Engineers

philippemnoel notion.site
philippemnoel Jul 22, 2025 parent

How so? Many popular projects are AGPL. MinIO, Grafana, etc.
We wrote about this here: https://www.paradedb.com/blog/agpl
philippemnoel Jul 21, 2025 parent

The value prop for customers vs Elasticsearch are:
- ACID w/ JOINs
- Real-time indexing under UPDATE-heavy workloads. Instacart wrote about this, they had to move away from Elasticsearch during COVID because of this problem: https://tech.instacart.com/how-instacart-built-a-modern-sear...
Beyond these two benefits, then the added benefits are:
- Infrastructure simplification (no need for ETL)
- Lower costs
Speaking the wire protocol is nice, but it's not worth much.
philippemnoel Jul 21, 2025 parent

Bear with me, this will be a bit of a longer answer. Today, there are two topologies under which people deploy ParadeDB.
- <some managed Postgres service> + ParadeDB. Frequently, customers already use a managed Postgres (e.g. AWS RDS) and want ParadeDB. In that world, they maintain their managed Postgres service and deploy a Kubernetes cluster running ParadeDB on the side, with one primary instance and some number of replicas. The AWS RDS primary sends data to the ParadeDB primary via logical replication. You can see a diagram here: https://docs.paradedb.com/deploy/byoc
In this topology, the OLTP and search/OLAP workloads are fully isolated from each other. You have two clusters, but you don't need a third-party ETL service since they're both "just Postgres".
- <self-hosted Postgres> + ParadeDB. Some customers, typically larger ones, prefer to self-host Postgres and want to install our Postgres extension directly. The extension is installed in their primary Postgres, and the CREATE INDEX commands must be issued on the primary; however, they may route reads only to a subset of the read replicas in their cluster.
In this topology, all writes could be directed to the primary, all OLTP read queries could be routed to a pool of read replicas, and all search/OLAP queries could be directed to another subset of replicas.
Both are completely reasonable approaches and depend on the workload. Hope this helps :)
philippemnoel Jul 21, 2025 parent

You don't need to. Customers usually deploy us on a standalone replica(s) on their Postgres cluster. If a query were to take it down, it would only take down the replica(s) dedicated to ParadeDB, leaving the primary and all other read replicas dedicated to OLTP safe.
philippemnoel Jul 21, 2025 parent

Our customers typically deploy ParadeDB in a primary-replicas topology, with one primary Postgres node and 2 or more read replicas, depending on read volume. Queries are executed on a single node today, yes.
We have plans to eventually support distributed queries.
philippemnoel Jul 21, 2025 parent

Yes, Figma!
philippemnoel Jul 21, 2025 parent

One of the ParadeDB maintainers here -- Being PostgreSQL wire protocol compatible is very different from being built inside Postgres on top of the Postgres pages, which is what ParadeDB does. You still need the "T" in ETL, e.g. transforming data from your source into the format of the sink (in your example CrateDB). This is where ETL costs and brittleness come into play.
You can read more about it here: https://www.paradedb.com/blog/block_storage_part_one
250 points Jul 21, 2025

We made Postgres writes faster, but it broke replication

52 comments philippemnoel paradedb.com

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous