Yet they still call it HA because there's nothing else. Even a planned shutdown of the primary to patch the OS results in downtime, as all connections are terminated. The situation is even worse for major database upgrades: stop the application, upgrade the database, deploy a new release of the app because some features are not compatible between versions, test, re-analyze the tables, reopen the database, and only then can users resume work.
Everything in SQL/RDBMS was thought for a single-node instance, not including replicas. It's not HA because there can be only one read-write instance at a time. They even claim to be more ACID than MongoDB, but the ACID properties are guaranteed only on a single node.
One exception is Oracle RAC, but PostgreSQL has nothing like that. Some forks, like YugabyteDB, provide real HA with most PostgreSQL features.
About the hype: many applications that run on PostgreSQL accept hours of downtime, planned or unplanned. Those who run larger, more critical applications on PostgreSQL are big companies with many expert DBAs who can handle the complexity of database automation. And use logical replication for upgrades. But no solution offers both low operational complexity and high availability that can be comparable to MongoDB
>> "God Send". Everything just worked. Replication was as reliable as one could imagine. It outlives several hardware incidents without manual intervention. It allowed cluster maintenance (software and hardware upgrades) without application downtime. I really dream PostgreSQL will be as reliable as MongoDB without need of external services.
https://www.postgresql.org/message-id/0e01fb4d-f8ea-4ca9-8c9...
Sure, the PostrgreSQL HA story isn't what we all want it to be, but the reliability is exceptional.
Database engineering is very hard. MongoDB has had both poor defaults as well as bugs in the past. It will certainly have durability bugs in the future, just like Postgres and all other serious databases. I'm not sure that Postgres' durability stacks up especially well with modern MongoDB.
[1] https://jepsen.io/analyses/postgresql-12.3
[2] https://archive.fosdem.org/2019/schedule/event/postgresql_fs...
CloudNativePG (https://cloudnative-pg.io) is a great option if you’re using Kubernetes.
There’s also pg_auto_failover which is a Postgres extension and a bit less complex than the alternatives, but it has its drawbacks.
FYI - it's already supported by cloudnativepg [1]
I was playing with this operator recently and I'm truly impressed - it's a piece of art when it comes to postgres automation; alongside with barman [2] it does everything I need and more
[1] https://cloudnative-pg.io/docs/1.28/connection_pooling [2] https://cloudnative-pg.io/plugin-barman-cloud/
Patroni has been around for awhile. The database-as-a-service team where I work uses it under the hood. I used it to build database-as-a-service functionality on the infra platform team I was at prior to that.
It's basially push-button production PG.
There's at least one decent operator framework leveraging it, if that's your jam. I've been living and dying by self-hosting everything with k8s operators for about 6-7 years now.
Currently scratching my head on what the appropriate upgrade procedure is for a non-k8s/operator spilo/patroni cluster for minimal downtime and risk. The script doesn't seem to work for this setup, erroring on mismatching PG_VERSION when attempting. If you don't mind sharing it would be very appreciated.
I read these and then wrote my own scripts that were tailored to my environment.
https://pganalyze.com/blog/5mins-postgres-zero-downtime-upgr...
https://www.pgedge.com/blog/always-online-or-bust-zero-downt...
https://knock.app/blog/zero-downtime-postgres-upgrades
Basically
- Created a new cluster on new machines
- Started logically replicating
- Waited for that to complete and then left it there replicating for a while until I was comfortable with the setup
- We were already using haproxy and pgbouncer
- Then I did a cut over to the new setup
- Everything looked good so after a while I tore down the old cluster
- This was for a database 600gb-1tb in size
- The client application was not doing anything overly fancy which meant there was very little to change going from 12 to 17
- Additionally I did all of the above in a staging environment first to make sure it would work as expected
Best of luck.
Going to have some more figuring out what's up with spilo - turns out that running that outside of k8s is rare and not much documented. But it's still patroni so this is very helpful.
My theory of why Postgres is still getting the hype is either people don't know the problem, or it's acceptable on some level. I've worked in a team that maintains the in house database cluster (even though we were using MySQL instead of PostgreSQL) and the HA story was pretty bad. But there were engineers manually recover the data lost and resolve data conflicts, either from the recovery of incident or from customer tickets. So I guess that's one way of doing business.
OTOH, Oracle takes most of my time with endless issues, bugs, unexpected feature modifications, even on OCI!
(I do use Maria at home for legacy reasons, and have used MySQL and Pg professionally for years.)
Can you give any details on that?
I switched to MariaDB back in the day for my personal projects because (so far as I could tell) it was being updated more regularly, and it was more fully open source. (I don't recall offhand at this point whether MySQL switched to a fully paid model, or just less-open.)
It’s not like Mongo gives you those properties for free either. Replication/clustering related data loss is still incredibly common precisely because mongo makes it seem like all that stuff is handled automatically at setup when in reality it requires plenty of manual tuning or extra software in order to provide the guarantees everyone thinks it does.
Until then it is nice to have options, even if they do require extra steps.