email: <stephan@{username}.dev>
meet.hn/city/us-Charlotte
Socials: - github.com/stephanGarland
Interests: DevOps, Hardware, Databases, Open Source, Linux
---
- Perhaps this is my bias coming through. I work with DBs day in and day out, and the main problem I face is performance from poorly-designed schemas and queries; next largest issue is referential integrity violations causing undefined behavior. The security issues I’ve found were all people doing absurdly basic stuff, like exposing an endpoint that dumped passwords.
To me, if you’re relying on having a matching PK as security, something has already gone wrong. There are ways to provide AuthN and AuthZ other than that. And yes, “defense in depth,” but if your base layer is “we have unguessable user ids,” IME people will become complacent, and break it somewhere else in the stack.
- I have never seen anyone post an actual example of the German Tank problem creating an issue for them, only that it’s possible.
> I don’t have to think about it
And here we have the main problem of most DB issues I deal with on a daily basis - someone didn’t want to think about the implications of what they were doing, and it’s suddenly then my emergency because they have no idea how to address it.
- The implication is that you need to know the PK ahead of time so that you can insert it into other tables which reference it as an FK without waiting for it to be returned, which further implies that you don’t have FK constraints, because the DB would disallow this.
Tbf in Postgres, you can declare FKs to be deferrable, so their existence is checked at transaction commit, rather than at insertion time.
If you don’t have the DB enforcing referential integrity, you need to be extremely careful in your application logic; IME, this inevitably fails. At some point, someone writes bad code, and you get data anomalies.
- Sounds like a lot of referential integrity violations.
- For InnoDB-based DBs that are not Aurora, and if the secondary index isn’t UNIQUE, it solves the problem, because secondary non-unique index changes are buffered and written in batches to amortize the random cost. If you’re hashing a guaranteed unique entity, I’d argue you can skip the unique constraint on this index.
For Aurora MySQL, it just makes it worse either way, since there’s no change buffer.
- IME, when performance issues become obvious, the devs are in growth mode and have no desire / time to revisit PK choice.
Integer PKs were seen as fine for years - decades, even - before the rise of UUIDs.
- There are plenty of ways to deal with that. You can shard by some other identifier (though I then question your table design), you can assign ranges to each shard, etc.
- Yes, and for Postgres, it also causes WAL bloat due to the high likelihood of full page writes.
- Assuming the integer value is the PK, it can in fact be much faster for MySQL / MariaDB due to InnoDB’s clustering index. If it can do a range scan over the PK, and that’s also the ORDER BY (with matching direction), congratulations, the rows are already ordered, no sort required. If it has to do a secondary index lookup to find the rows, this is not guaranteed.
- > Permanent identifiers should not carry data.
Did you read the article? He doesn’t recommend natural keys, he recommends integer-based surrogates.
> A prime example of premature optimization.
Disagree. Data is sticky, and PKs especially so. Moreover, if you’re going to spend time optimizing anything early on, it should be your data model.
> Don't make decisions you will regret just to shave off a couple of milliseconds!
A bad PK in some databases (InnoDB engine, SQL Server if clustered) can cause query times to go from sub-msec to tens of msec quite easily, especially with cloud solutions where storage isn’t node-local. I don’t just mean a UUID; a BIGINT PK on a 1:M can destroy your latency for the simple reason of needing to fetch a separate page for every record. If instead the PK is a composite of (<linked_id>, id) - e.g. (user_id, id) - where id is a monotonic integer, you’ll have WAY better data locality.
Postgres suffers a different but similar problem with its visibility map lookups.
- I know this will come across as a trope, but a calculator doesn’t seem to me to be at the same level as AI in most circumstances.
If we’re talking about grade school children who are learning multiplication, then yes, a calculator is unhelpful to their education. If we’re talking about a high school physics exam, it probably doesn’t matter if you can show your work on converting units so much as it does that you knew which formulae to use.
- > how taxes worked
Given that I worked with people well before the advent of LLMs who had no idea how marginal tax rates worked, it seems like we should be more aggressively pursuing this as an educational goal.
- First of all, the entire post reads like it was written by AI.
Secondly, the author / prompter misses the point entirely with this closing paragraph:
> The next time a teacher complains about AI cheating, ask: If a machine can do this assignment perfectly, why are you giving it to this student?And then we can replace it with education and work that actually matters.
You learn fundamentals because they are necessary for you to understand how the magic works, and because that’s how the human brain works.
Is it important for you to be able to write a binary search algorithm perfectly from scratch? Not especially, no. Is it important for you to be able to describe what it’s doing, and why? Yes, very much so, because otherwise you won’t know when to use it.
If your rebuttal to this is “we can feed the problem to AI and let it figure that out,” I don’t want to live in that world; where curiosity and thought are cast aside in favor of faster results.
- Wait until you find out about network endianness.
- Forgive me, I didn’t watch the videos: is that what the Dream Router supports - normal wired WAN uplink, plus 5G failover? If so, yes, that’s very attractive.
I have a T-Mobile backup home internet plan, and when I had a rack set up, it was my failover from fiber. The Dream Machine Pro did auto failover and failback flawlessly. However, I recently moved, and am redoing my homelab so I have no rack right now; internet is from a Dream Router, so I don’t have auto-failover. I doubt I’d buy this for the small window of time I expect to be in this situation, but if you didn’t have or want a rack, an AIO with failover would be great.
- So you’re saying, do something like this?
1. Index-only scans on t_product.{category,color} indices, summing each value
2. Lookup the names of those values in their parent tables, generate output rows
If so, I suspect there are two reasons why it might not do that:
Given the relatively small size of the t_product table (23 bytes overhead + 1 byte padding + int4 + int4 + 16 bytes text + [I think] 1 byte varlena = 49 bytes/row), it will be fairly well bin-packed into pages on the heap, consuming roughly 170 pages, assuming 8 KiB default, and default fillfactor of 100%). That trivially fits into a single segment file on-disk, and is a very easy sequential scan.
If it does a sequential scan on the heap, it doesn’t have to check the Visibility Map, because it already has that information in the heap itself, which avoids a second (albeit small) lookup.
Happy for someone who knows more about Postgres to correct me if I’m wrong, though!
- Nobody does 4 until they’ve had multiple large incidents involving DBs, or the spend gets hilariously out of control.
Then they hire DBREs because they think DBA sounds antiquated, who then enter a hellscape of knowing exactly what the root issues are (poorly-designed schemata, unperformant queries, and applications without proper backoff and graceful degradation), and being utterly unable to convince management of this (“what if we switched to $SOME_DBAAS? That would fix it, right?”).
- If you want to pray before your meal, that doesn’t affect me. If we’re eating together and you want to pray before your meal, I can stay respectfully silent for 30 seconds, because it’s not that much to ask. If you insist I join you in prayer, or that I must kneel, etc. then yes, we have a problem. I have never found the latter case to exist.
- One difference, IMO, is that in technical forums / chat rooms, flaming was generally (modulo Torvalds, but honestly I give him a pass - the guy has to keep the level of quality excruciatingly high at all times, and that must be tiring) more reserved, and was along the lines of “RTFM.”
That’s not to say there’s more vitriol today; it’s swung the opposite direction, where newbies expect to have answers handed to them, or worse, they’ll post AI slop and then be genuinely surprised when someone asks them to explain it, or to show their work.
I don’t think that people should be belittled, but I also think it’s unrealistic to expect that experienced people should patiently teach every newcomer the same things over and over, without them having put in the minimum effort of reading docs.
I’m reminded of something I saw once on Twitter from a hiring manager who said that the best signal they had was whether a candidate had a homelab. This was met with criticism from many, who bizarrely accused him of being classist, since “not everyone has time to do that for fun.”
Re: natural keys (or something like it), I was using it as an example of how badly PK choice can impact performance at scale.