Preferences

Yes, the design of std::atomic probably favors x64 in certain areas. However, you initially claimed that std::atomic has been designed with only x64 in mind. This is simply not true, which is easily proven by the fact that they explicitly support weak memory models.

> std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm

Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.

I appreciate your insight, but it could have been delivered with less hyperbole.


Apologies for the style of my previous messages.

> they explicitly support weak memory models.

Sure, but memory ordering is orthogonal to LL/SC vs CAS.

To me, fetch_update not being present from std::atomic's inception is major design oversight as CAS can be emulated via LL/SC but not the other way round.

Furthermore, fetch_update code is easy to read and less awkward to write than CAS loops (which currently are the only way std::atomic offers, and this is what I'm complaining about)

> Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.

I disagree. Atomic variables (atomic instructions) are usually used to implement synchronization primitives, and are thus often meant to be used in very hot paths. 30% perf drops are actually quite bad, in that regard.

Of course if one is restricting themselves to using only member methods (fetch_add, fetch_or, etc.), then all is fine because these methods are optimized.

All in all, C++'s stdlib (the parts that aren't just __builtin wrappers, to be precise) is actually quite fine for most use-cases, like PC applications. Indeed, it is when one has latency constraints and/or severe memory constraints (e.g. < 512 KiB) that the stdlib feels like a hindrance.

Thanks for the leveled response!

> Sure, but memory ordering is orthogonal to LL/SC vs CAS.

Sure, but your original claim was that std::atomic has been designed with only x64 in mind. That's what I meant to argue against.

I agree that the omission of something like fetch_update() has been an oversight and I hope that it will make it into the C++ standard!

As a side note, here's what the Rust docs say about fetch_update():

> This method is not magic; it is not provided by the hardware. It is implemented in terms of AtomicUsize::compare_exchange_weak, and suffers from the same drawbacks.

https://doc.rust-lang.org/std/sync/atomic/struct.AtomicUsize...

So Rust's std::sync::atomic is equally "useless"? :)

Heh, you're right, good catch: https://godbolt.org/z/3cEfbqM51

Looks like their (Rust) main motivator was readability. Whereas P3330R0 has that + performance on non-CAS hardware in mind. In any case, Rust's function could be optimized in the future, if they decide on it.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal