Thanks for your insight. I got ratioed to fuck for trying to defend the standpoint that this is an unusual expectation of a regular engineer to stand this up correctly.
If you're referring to the downvotes on https://www.hackerneue.com/item?id=42873211, I think that comment would have done better if you had omitted the swipes, as the site guidelines ask: https://news.ycombinator.com/newsguidelines.html.
e.g. "You are, in typical HN style, minimising the problem into insignificance" and "love how this is getting ratioed by egotistical self confessed x10 engineers". This is the sort of thing commenters here are asked to edit out of their comment, and when they don't, it's correct to downvote them (even though your underlying points may otherwise be correct).
lol, nice. getting out in front of anyone even potentially pointing fingers at ClickHouse. Good initiative.
If someone wants to configure an unauthenticated access from the Internet, they have to do the following extra steps:
- enable listening to the wildcard address;
- remove IP filtering for the default user;
- set up a no-password authentication;
It is possible to ignore and turn off all guardrails that the system has by default, but it needs extra efforts. However, it's possible that someone copy-pasted a wrong configuration file from somewhere without knowing what is inside, or do something like - listen to localhost, but expose ports from Docker.
A use case for direct database access exists, and is acceptable, assuming you set up a readonly user, grant access to specific tables, limit queries by complexity, and limit total usage by quotas. This is demonstrated by the following public services:
https://play.clickhouse.com/
https://adsb.exposed/
https://reversedns.space/
In this way, ClickHouse can be used to implement public data APIs (which is probably not what DeepSeek wanted).
ClickHouse has a wide range of security and access control restrictions: authentication methods with SSL certificates; SSH keys; even simple password-based auth allows bcrypt and short-living credentials; integration with LDAP and Kerberos; every authentication method can be limited on a network level; full Role-Based Access Control; fine-grained restrictions on query complexity and resource consumption, user quotas.
But still, according to Shodan, there are 33,000 misconfigured ClickHouse servers on the Internet: https://www.shodan.io/search?query=clickhouse This can be attributed to a high popularity of ClickHouse (it is the most widely used analytic DBMS).
When you use ClickHouse Cloud, which is a commercial cloud service based on the open-source ClickHouse database (https://clickhouse.com/cloud), it ensures the needed security measures, improving strong defaults even more: TLS, stong credentials, IP filtering; plus it allows private link, data encryption with customer keys, etc.