Comment by colonCapitalDee

colonCapitalDee Nov 23, 2025 parent

Great article. Can confirm, writing performance focused C# is fun. It's great having the convenience of async, LINQ, and GC for writing non-hot path "control plane" code, then pulling out Vector<T>, Span<T>, and so on for the hot path.

One question, how portable are performance benefits from tweaks to memory alignment? Is this something where going beyond rough heuristics (sequential access = good, order of magnitude cache sizes, etc) requires knowing exactly what platform you're targeting?

bofersen Nov 23, 2025

Author here. First of all, thanks for the compliment! It’s tough to get myself to write these days, so any motivation is appreciated.

And yes, once all the usual tricks have been exhausted, the nest step is looking at the cache/cache line sizes of the exact CPU you’re targeting and dividing the workload into units that fit inside the (lowest level possible) cache, so it’s always hot. And if you’re into this stuff, then you’re probably aware of cache-oblivious algorithms[0] as well :)

Personally, I almost never had the need to go too far into platform-specific code (except SIMD, of course), doing all the stuff in the post is 99% of the way there.

And yeah, C# is criminally underrated, I might write a post comparing high-perf code in C++ and C# in the future.

[0]: https://en.wikipedia.org/wiki/Cache-oblivious_algorithm

pjc50 Nov 25, 2025

Enjoying the C# appreciation.

>> C# has an awesome situation in here with its support for value types (ref structs), slices (spans), stack allocation, SIMD intrinsics (including AVX512!). You can even go bare-metal and GC-free with bflat.

There's been a really solid effort by the maintainers to improve performance in C# , especially with regard to keeping stuff off the heap. I think it's a fantastic language for doing backends in. It's unfortunate that one of the big language users, Unity, has not yet updated to the modern runtime.

neonsunset Nov 25, 2025 (dead)

hansvm Nov 25, 2025

One other trick I use reasonably often is using something more complicated than AoS or SoA layouts. Reasons vary (the false sharing padding in your article is one example), but cache lines are another good one. You might, e.g., want an AoSoA structure to keep the SoA portion of things on a single cache line if you know you'll always need both data elements (the entire struct), want to pack as much data in a cache line as possible, and also want that data to be aligned.

Great article by the way.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous