It blows my mind that a high availability system would purposefully prevent availability as a “feature”.
[0] https://martin.kleppmann.com/2015/05/11/please-stop-calling-...
A partition is when some nodes can’t reach other nodes.
Zookeeper instead has an issue where it does try to restart but the timeout (why?!) is too short, something like 30 seconds. If the majority of your nodes don’t all start within a certain time window the whole cluster stays down until someone manually intervenes.
I discovered this fun feature when keeping non-prod systems off to save money in the cloud.
It also has an impact when making certain big bang changes in production.
Other data that is ETL’d and might need to update? That sucks.
Anyway, yes, if your data is highly mutable, or you cannot do batch writes, then yes, Clickhouse is a wrong choice. Otherwise... it is _really_ hard to ignore 50x (or more) speedup.
Logs, events, metrics, rarely updated things like phone numbers or geocoding, archives, embeddings... Whoooop — it slurps entire Reddit in 48 seconds. Straight from S3. Magic.
If you still want really fast analytics, but have more complex scenarios and/or data loading practices, there's also Kinetica... if you can afford the price. For tiny datasets (a few terabytes), DuckDB might be a great choice too. But Postgres is usually a wrong thing to make work.
What I am saying is that I really dislike working in Clickhouse with all of the weird foot guns. Unless you are using it in a very specific, and in my opinion, limited way, it feels worse than Postgres in every way.