Comment by spacechild1

spacechild1 Jun 2, 2025 parent

Yes, the design of std::atomic probably favors x64 in certain areas. However, you initially claimed that std::atomic has been designed with only x64 in mind. This is simply not true, which is easily proven by the fact that they explicitly support weak memory models.

> std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm

Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.

I appreciate your insight, but it could have been delivered with less hyperbole.

TuxSH Jun 2, 2025

Apologies for the style of my previous messages.

> they explicitly support weak memory models.

Sure, but memory ordering is orthogonal to LL/SC vs CAS.

To me, fetch_update not being present from std::atomic's inception is major design oversight as CAS can be emulated via LL/SC but not the other way round.

Furthermore, fetch_update code is easy to read and less awkward to write than CAS loops (which currently are the only way std::atomic offers, and this is what I'm complaining about)

> Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.

I disagree. Atomic variables (atomic instructions) are usually used to implement synchronization primitives, and are thus often meant to be used in very hot paths. 30% perf drops are actually quite bad, in that regard.

Of course if one is restricting themselves to using only member methods (fetch_add, fetch_or, etc.), then all is fine because these methods are optimized.

All in all, C++'s stdlib (the parts that aren't just __builtin wrappers, to be precise) is actually quite fine for most use-cases, like PC applications. Indeed, it is when one has latency constraints and/or severe memory constraints (e.g. < 512 KiB) that the stdlib feels like a hindrance.

spacechild1 OP Jun 2, 2025

Thanks for the leveled response!

> Sure, but memory ordering is orthogonal to LL/SC vs CAS.

Sure, but your original claim was that std::atomic has been designed with only x64 in mind. That's what I meant to argue against.

I agree that the omission of something like fetch_update() has been an oversight and I hope that it will make it into the C++ standard!

As a side note, here's what the Rust docs say about fetch_update():

> This method is not magic; it is not provided by the hardware. It is implemented in terms of AtomicUsize::compare_exchange_weak, and suffers from the same drawbacks.

https://doc.rust-lang.org/std/sync/atomic/struct.AtomicUsize...

So Rust's std::sync::atomic is equally "useless"? :)

TuxSH Jun 2, 2025

Heh, you're right, good catch: https://godbolt.org/z/3cEfbqM51

Looks like their (Rust) main motivator was readability. Whereas P3330R0 has that + performance on non-CAS hardware in mind. In any case, Rust's function could be optimized in the future, if they decide on it.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous