Heck, even std::atomic was designed with only x64 in mind (it clearly shows), and is unusable outside it. One is incentivized to write their own "atomic" class until P3330R0 is approved for RMW-centric platforms ISAs like Aarch32 and Aarch64.
And of course, Rust already has "fetch_update"...
It certainly wasn't.
> and is unusable outside it
Total hyperbole. It's perfectly usable on ARM and other platforms.
P3330R0 looks a nice addition, though.
The idiomatic way to do RMW (outside simple stuff like fetch-increment) with std::atomic maps 1:1 with x64 assembly and since fetch_update isn't provided, it's the only way to do it. It's way too close for comfort. See [1] for a comparison
> Total hyperbole. It's perfectly usable on ARM and other platforms.
It's not hyperbole. std::atomic is portable, but that's all it is.
std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm (or custom reimplementations that provide fetch_update -- same thing). See [2] for a benchmark.
> std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm
Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.
I appreciate your insight, but it could have been delivered with less hyperbole.
> they explicitly support weak memory models.
Sure, but memory ordering is orthogonal to LL/SC vs CAS.
To me, fetch_update not being present from std::atomic's inception is major design oversight as CAS can be emulated via LL/SC but not the other way round.
Furthermore, fetch_update code is easy to read and less awkward to write than CAS loops (which currently are the only way std::atomic offers, and this is what I'm complaining about)
> Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.
I disagree. Atomic variables (atomic instructions) are usually used to implement synchronization primitives, and are thus often meant to be used in very hot paths. 30% perf drops are actually quite bad, in that regard.
Of course if one is restricting themselves to using only member methods (fetch_add, fetch_or, etc.), then all is fine because these methods are optimized.
All in all, C++'s stdlib (the parts that aren't just __builtin wrappers, to be precise) is actually quite fine for most use-cases, like PC applications. Indeed, it is when one has latency constraints and/or severe memory constraints (e.g. < 512 KiB) that the stdlib feels like a hindrance.
Pretty much everything else (e.g. iostreams) is horrible.