> they explicitly support weak memory models.
Sure, but memory ordering is orthogonal to LL/SC vs CAS.
To me, fetch_update not being present from std::atomic's inception is major design oversight as CAS can be emulated via LL/SC but not the other way round.
Furthermore, fetch_update code is easy to read and less awkward to write than CAS loops (which currently are the only way std::atomic offers, and this is what I'm complaining about)
> Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.
I disagree. Atomic variables (atomic instructions) are usually used to implement synchronization primitives, and are thus often meant to be used in very hot paths. 30% perf drops are actually quite bad, in that regard.
Of course if one is restricting themselves to using only member methods (fetch_add, fetch_or, etc.), then all is fine because these methods are optimized.
All in all, C++'s stdlib (the parts that aren't just __builtin wrappers, to be precise) is actually quite fine for most use-cases, like PC applications. Indeed, it is when one has latency constraints and/or severe memory constraints (e.g. < 512 KiB) that the stdlib feels like a hindrance.
> Sure, but memory ordering is orthogonal to LL/SC vs CAS.
Sure, but your original claim was that std::atomic has been designed with only x64 in mind. That's what I meant to argue against.
I agree that the omission of something like fetch_update() has been an oversight and I hope that it will make it into the C++ standard!
As a side note, here's what the Rust docs say about fetch_update():
> This method is not magic; it is not provided by the hardware. It is implemented in terms of AtomicUsize::compare_exchange_weak, and suffers from the same drawbacks.
https://doc.rust-lang.org/std/sync/atomic/struct.AtomicUsize...
So Rust's std::sync::atomic is equally "useless"? :)
Looks like their (Rust) main motivator was readability. Whereas P3330R0 has that + performance on non-CAS hardware in mind. In any case, Rust's function could be optimized in the future, if they decide on it.
> std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm
Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.
I appreciate your insight, but it could have been delivered with less hyperbole.