andres [at] hn dot anarazel dot de
- > First, although I work at Oxide, please don't think I speak for Oxide. None of this happened at Oxide. It informed some of the choices we made at Oxide and we've talked about that publicly. I try to remember to include the caveat that this information is very dated (and I made that edit immediately after my initial comment above).
I said oxide, because it's come up so frequently and at such length on the oxide podcast... Without that I probably wouldn't have commented here. It's one thing to comment on bad experiences, but at this point it feels like more like bashing. And I feel like an open source focused company should treat other folks working on open source with a bit more, idk, respect (not quite the right word, but I can't come up with a better one right now).
I probably shouldn't have commented on this here. But I read the message after just having spent a Sunday morning looking into a problem and I guess that made more thin skinned than usual.
> For most of that time (and several years earlier), the community members we reached out to were very dismissive, saying either these weren't problems, or they were known problems and we were wrong for not avoiding them, etc.
I agree that the wider community sometimes has/had the issue of excusing away postgres problems. While I try to avoid doing that, I certainly have fallen prey to that myself.
Leaving fandom like stuff aside, there's an aspect of having been told over and over we're doing xyz wrong and things would never work that way, and succeeding (to some degree) regardless. While ignoring some common wisdom has been advantageous, I think there's also plenty where we just have been high on our own supply.
> What remains is me feeling triggered when it feels like users' pain is being casually dismissed.
Was that done in this thread?
- The issue is more fundamental - if you have purely random keys, there's basically no spatial locality for the index data. Which means that for decent performance your entire index needs to be in memory, rather than just recent data. And it means that you have much bigger write amplification, since it's rare that the same index page is modified multiple times close-enough in time to avoid a second write.
- VACUUM FULL is about cleanup up things above the level of a single page. Moving stuff around within a page doesn't allow you to reclaim space on the OS level, nor does it "compact" tuples onto fewer pages.
But it's important for normal vacuum to compact the tuples on the page, otherwise the space of deleted tuples couldn't effectively be reused. Imagine a page that's entirely filled with 100 byte tuples, then every other tuple is deleted. Then, after a vacuum, a single 108 byte tuple should be inserted onto the page. Without compacting the space in the page during the vacuum, there would not be any space for that larger tuple.
- > Every time Postgres advice says to “schedule [important maintenance] during low traffic period” (OP) or “outside business hours”, it reinforces my sense that it’s not suitable for performance-sensitive data path on a 24/7/365 service and I’m not sure it really aims to be.
It's a question of resource margins. If you have regular and predictable windows of low resource utilization, you can afford to run closer to the sun during busy periods, deferring (and amortizing, to some degree) maintenance costs till later. If you have a 24/7/365 service, you need considerably higher safety margins.
Also, there's a lot of terrible advice on the internet, if you haven't noticed.
> (To be fair, running it like that for several years and desperately trying to make it work also gave me that feeling. But I’m kind of aghast that necessary operational maintenance still carries these caveats.)
To be fair, I find oxides' continual low-info griping against postgres a bit tedious. There's plenty weaknesses in postgres, but criticizing postgres based on 10+ year old experiences of running an, at the time, outdated postgres, on an outdated OS is just ... not useful? Like, would it useful to criticize oxides lack of production hardware availability in 2021 or so?
Edit: duplicated word removed
- > > When VACUUM runs, it removes those dead tuples and compacts the remaining rows within each page.
> No it doesn’t. It just removes unused line pointers and marks the space as free in the FSM.
It does:
https://github.com/postgres/postgres/blob/b853e644d78d99ef17...
Which is executed as part of vacuum.
- It's true - otherwise the space couldn't freely be reused, because the gaps for the vacuumed tuples wouldn't allow for any larger tuples to be inserted.
See https://github.com/postgres/postgres/blob/b853e644d78d99ef17...
- You can just set fsync=off if you don't want to flush to disk and are ok with corruption in case of a OS/hw level crash.
- You don't need to deal with them as a patch author :)
- If you just make a change of code, you don't need to handle translations at that time. That will get done by the various translation teams closer to the release. However you do need to make sure that the code is translatable (e.g. injecting pre-formulated english messages into a larger message is problematic).
- That should be trivial to change:
https://github.com/postgres/postgres/blob/d2f24df19b7a42a094...
- We don't have the complete version history of postgres, so that's not easy to know. There definitely are still lines from Postgres95 that haven't been changed since the initial import into our repository.
Somewhere there's a CVS repository with some history from before the import into the current repository, but unfortunately there's a few years missing between that repository and the initial import. I've not done the work to analyze whether any lines from that historical repo still survive.
- It got reverted for now: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
- The docs for 18 also show it, where do you get from that it's not available for 18?
- It is. I tried to repro it without success.
I wonder if it's just being executed on a different VMs with slightly different performance characteristics. I can't tell based on the formulation in the post whether all the runs for one test are executed on the same VM or not.
- Afaict nothing in this benchmark will actually use AIO in 18. As of 18 there is aio reads for seq scans, bitmap scans, vacuum, and a few other utility commands. But the queries being run should normally be planned as index range scans. We're hoping to the the work for using AIO for index scans into 19, but it could work end up in 20, it's nontrivial.
It's also worth noting that the default for data checksums has changed, with some overhead due to that.
- On the 48 core system, building linux peaks at about 48GB/s; LLVM peaks at something like 25GB/s.
The system has well over 450GB/s of memory bandwidth.
- > Nowhere in my comment have I used Linux kernel as an example. It's not a great example neither since it's mostly trivial to compile in comparison to the projects I had experience with.
It's true across a wide range of projects. I build a lot of stuff from source and I routinely look at performance counters and other similar metrics to see what the bottlenecks are (I'm almost clinically impatient).
Building e.g. LLVM, a project with much longer per-translation unit build times, shows that memory bandwidth is even less of a bottleneck. Whereas fetch latency increased as a bottleneck.
> Core can be 100% busy but as I see you're a database kernel developer you must surely know that this can be an artifact of a stall in a memory backend of the CPU. I rest my case.
Hence my reference to doing a topdown analysis with perf. That provides you with a high-level analysis of what the actual bottlenecks are.
Typical compiler work (with typical compiler design) has lots of random memory accesses. Due to access latencies being what they are, that prevents you from actually doing enough memory accesses to reach a particularly high memory bandwidth.
- That workstation has 2x10 cores / 20 threads. I also executed the test on a newer workstation with 2x24 cores with similar results, but I thought the older workstation is more interesting, as the older workstation has a much worse memory bandwidth.
Sorry, but compilation is simply not memory bandwidth bound. There are significant memory latency effects, but bandwidth != latency.
- This is just wildly wrong.
On an older 2 socket workstation, with relatively poor memory bandwidth, I ran a linux kernel compile.
indicates that memory bandwidth is not a bottleneck. Fetch latency, branch mispredicts and the frontend are.perf stat --topdown --td-level 2I also analyzed the memory bandwidth using
and it never gets anywhere close to the memory bandwidth the system can trivially utilize (it barely reaches the bandwidth a single core can utilize).perf stat --per-socket -M memory_bandwidth_read,memory_bandwidth_write -a -r0 sleep 1iostat indicates there are pretty much no reads/writes happening on the relevant disks.
Every core is 100% busy.
> I (we?) think Postgres is incredibly important, and I think we have properly contextualized our use of it. Moreover, I think it is unfair to simply deny us our significant experience with Postgres because it was not unequivocally positive -- or to dismiss us recounting some really difficult times with the system as "bashing" it. Part of being a consequential system is that people will have experience with it; if one views recounting that experience as showing insufficient "respect" to its developers, it will have the effect of discouraging transparency rather than learning from it.
I agree that criticism is important and worthwhile! It's helpful though if it's at least somewhat actionable. We can't travel back in time to fix the problems you had in the early 2010s... My experience of the criticism of the last years from the "oxide corner" was that it sometimes felt somewhat unrelated to the context and to today's postgres.
> if one views recounting that experience as showing insufficient "respect" to its developers
I should really have come up with a better word, but I'm still blanking on choosing a really apt word, even though I know it exists. I could try to blame ESL for it, but I can't come up with a good German word for it either... Maybe "goodwill". Basically believing that the other party is trying to do the right thing.