Comment by joshstrange

joshstrange 4 days ago parent

Every time I use Clickhouse I want blow my brains out, especially knowing that Postgres exists. I’m not saying Clickhouse doesn’t have its place or that Postgres can do everything that Clickhouse can.

What I am saying is that I really dislike working in Clickhouse with all of the weird foot guns. Unless you are using it in a very specific, and in my opinion, limited way, it feels worse than Postgres in every way.

mdaniel 4 days ago

Anything in my life that uses Zookeeper or its dumbass etcd friend means I'm going to have a real bad time. I am thankful they're at least shipping their own ZK-ish but it seems to have fallen into the same trap as etcd, where membership has to be managed like the precious little pets that they are https://clickhouse.com/docs/guides/sre/keeper/clickhouse-kee...

jiggawatts 4 days ago

Zookeeper in the only clustering product I’ve ever used that actively refused to start a cluster after an all-nodes stop/start.

It blows my mind that a high availability system would purposefully prevent availability as a “feature”.

sciurus 3 days ago

Although this is oversimplifying things [0], in the face of partitions zookeeper emphasizes consistency over availability.

[0] https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

jiggawatts 3 days ago

The problem with that is all nodes stop-start is not a partition!

A partition is when some nodes can’t reach other nodes.

Zookeeper instead has an issue where it does try to restart but the timeout (why?!) is too short, something like 30 seconds. If the majority of your nodes don’t all start within a certain time window the whole cluster stays down until someone manually intervenes.

I discovered this fun feature when keeping non-prod systems off to save money in the cloud.

It also has an impact when making certain big bang changes in production.

valyala 3 days ago

Just don't use ClickHouse for OLTP tasks. ClickHouse is an analytical database, which isn't optimized for transactional workloads. Keep calm and use Postgresql for OLTP, and ClickHouse for OLAP.

atemerev 4 days ago

I mostly need analytics, all data is immutable and append-only.

joshstrange OP 4 days ago

And that’s exactly the limited-ness I’m talking about. If that works for you, Clickhouse is amazing. For things like logs I can 100% see the value.

Other data that is ETL’d and might need to update? That sucks.

atemerev 4 days ago

If you can afford rare, batched updates, it sucks much less.

Anyway, yes, if your data is highly mutable, or you cannot do batch writes, then yes, Clickhouse is a wrong choice. Otherwise... it is _really_ hard to ignore 50x (or more) speedup.

Logs, events, metrics, rarely updated things like phone numbers or geocoding, archives, embeddings... Whoooop — it slurps entire Reddit in 48 seconds. Straight from S3. Magic.

If you still want really fast analytics, but have more complex scenarios and/or data loading practices, there's also Kinetica... if you can afford the price. For tiny datasets (a few terabytes), DuckDB might be a great choice too. But Postgres is usually a wrong thing to make work.

edmundsauto 4 days ago

There are design patterns / architectures that data engineers often employ to make this less "sucky". Data modeling is magical! (Specifically talking about things like datelist and cumulative tables)

slt2021 4 days ago

you are doing data warehousing wrong, need to learn basics of data warehousing best practices.

Data Warehouse consists of Slowly Changing Dimensions and Facts. none of these require updates

This item has no comments currently.