Preferences

exDM69
Joined 10,517 karma

  1. Yes, sadly this isn't a part of standard C or C++.

    It is available as a language extension in Clang and GCC and widely used (e.g. by the Linux kernel).

    Unfortunately it is not supported by the third major compiler out there so many projects can't or don't want to use it.

  2. > tons of drivers that dont implement the extensions that really improve things.

    This isn't really the case, at least on desktop side.

    All three desktop GPU vendors support Vulkan 1.4 (or most of the features via extensions) on all major platforms even on really old hardware (e.g. Intel Skylake is 10+ years old and has all the latest Vulkan features). Even Apple + MoltenVK is pretty good.

    Even mobile GPU vendors have pretty good support in their latest drivers.

    The biggest issue is that Android consumer devices don't get GPU driver updates so they're not available to the general public.

  3. Anything involving people on the ground is just too slow.

    It takes radars, interceptor drones, sensor networks, etc. Stuff like this is in active development but not widely deployed yet.

  4. I've seen my fair share of frontline combat videos from Ukraine.

    The hard part isn't shooting a drone when it is in shotgun range. It's getting the shooter close enough to the drone to have a chance of taking the shot in the first place.

    For example the drones mentioned in the article can fly at 2.5km altitude at 140km/h.

  5. The drones here aren't your neighbor's kids' quadrotors. Some sightings over airports have been large (>2m) fixed wing aircraft travelling at 200 km/h. Even the quads are pretty fast. And they can appear out of nowhere, taking off from the ground near the target.

    Shooting them down from the ground is next to impossible. They don't hover around waiting for someone to come by with a shotgun in their hand, catching them by land (ie. chasing them in a car) is not feasible.

    Just to give an idea how hard it is to hit airborne targets from the ground with traditional guns: I once spent an afternoon shooting at a slow moving fixed wing target drone with tracer rounds from a 12.7mm anti-aircraft machine gun. There were about 50 of us taking turns, each with a few hundred rounds to shoot at the damn thing and the target aircraft didn't get a single hit.

    My guess is that the drones are conducting signals intelligence, listening to radar signals and radio comms around sensitive installations (airports, military bases) and surveying the response time to a sighting.

  6. > Rust has native SIMD support

    std::simd is nightly only.

    > while in standard C there is no way to express that.

    In ISO Standard C(++) there's no SIMD.

    But in practice C vector extensions are available in Clang and GCC which are very similar to Rust std::simd (can use normal arithmetic operations).

    Unless you're talking about CPU specific intrinsics, which are available to in both languages (core::arch intrinsics vs. xmmintrin.h) in all big compilers.

  7. > If the DoD enforces the requirement for Ada, Universities, job training centers, and companies will follow

    DoD did enforce a requirement for Ada but universities and others did not follow.

    The JSF C++ guidelines were created for circumventing the DoD Ada mandate (as discussed in the video).

  8. Correct me if I'm wrong but during this timeframe (circa 2005), Java was not open source at all. OpenJDK was announced in 2006 and first release was 2008, by which time the days Java in the browser were more or less over.
  9. Exactly.

    Java was so buggy and had so many security issues about 20 years ago that my local authorities gave a security advisory to not install it at all in end user/home computers. That finally forced the hand of some banks to stop using it for online banking apps.

    Flash also had a long run of security issues.

  10. The project itself is cool and useful but the motivating example of crypto (primitives?) isn't great.

    Cryptography is already difficult to write in high level languages without introducing side channels via timing, branch predictor, caches etc.

    Cryptography while going through two high level compilers, especially when the code was not designed and written to do so is an exercise fraught with peril.

    Tbf, this is just nitpicking about the article, not the project itself

  11. Hi, I don't have public examples to share but I can give an explanation of a simple scenario.

    I have a container of resources, e.g. textures. When the GPU wants to use them, CPU will lease them until a point of time in the future denoted by a value (u64) of a GPU timeline semaphore. The handle and value of the semaphore is added to a list guarded by a mutex. Then GPU work is kicked off and the GPU will increment semaphore to that value when done.

    In the Drop implementation of the container, we need to wait until all semaphores reach their respective value before freeing resources, and do so even if some thread panicked while holding the lock guarding the list. This is where I use .unwrap_or_else to get the list from the poison value.

    It's not infeasible to try to catch any errors and propagate them when the lock is grabbed. But this is mostly for OOM and asserts that are not expected to fire. The ergonomics would be worse if the "lease" function would be fallible.

    This said, I would not object to poisoning being made optional.

  12. I've used recovering from poisoned state in impl Drop in quite a few places.

    In my case it's usually waiting for the GPU to finish some asynchronous work that's been spun up by CPU threads that may have panicked while holding the lock. This is necessary to avoid freeing resources that the GPU may still be using.

    I usually prefix this with `if !std::thread::panicking() {}`, so I don't end up waiting (possibly forever) if I'm already cleaning up after a panic.

  13. Worth noting that this is not `std::mutex` or `parking_lot::mutex` as discussed in the article, but `tokio::sync::Mutex` in cancellable async code.
  14. I disagree, lock poisoning is a good way of improving correctness of concurrent code in case of fatal errors. As demonstrated by the benchmarks in this article, it's not very expensive for typical use cases.

    In 99% of the cases where one thread has panic'd while holding a lock, you want to panic the thread that attempts to grab the lock. The contents of anything inside the lock is very much undefined and continuing will lead to unpredictable results. So most of the time you just want:

        let guard = mutex.lock().expect("poisoned");
    
    The last 1% is when you want to clean up something even if a panic has occured. This is usually in a impl Drop situation. It's not much more verbose either, just:

        let guard = mutex.lock().unwrap_or_else(|poison| poison.into_inner());
    
    What is painful is trying to propagate the poison value as an error using `?`. In that case you're probably better off using a match expression because the usual `.into()` will not play nice with common error handling crates (thiserror, anyhow) or need to implement `From` manually for the error types and drop the contents of the poison error before propagating.

    This might be the case for long running server processes where you have n:m threading with long running threads and want to keep processing other requests even if one request fails. Although in that case you probably want (or your framework provides) some kind of robustness with `catch_unwind` that will log the errors, respond with HTTP 500 or whatever and then resume. Because that's needed to catch panics from non-mutex related code.

  15. > A f32x16 version would also be faster on 256b hardware, but spill in SSE. For Zen5 you probably want to use f32x32.

    Yeah, exceeding native vector width is kinda just adding another round of loop unrolling. Sometimes it helps, sometime it doesn't. This is probably mostly about register pressure.

    And architecture specific benchmarking is required if you want to get most performance out of it.

    > I'd prefer if std::simd would encurage relative to native SIMD width scaling (and support scalable SIMD ISAs).

    It is possible to write width-generic SIMD code (ie. have vector width as generic parameter) in Rust std::simd (or C++ templates and vector extensions) and make it relative to native vector width (albeit you need to explicitly define that).

    In my problem domain (computer graphics etc) the vector width is often mandated by the task at hand (e.g. 2d vs 3d). It's often not about doing something on an array of size N. This does not lead to optimal HW utilization, but it's convenient and still a lot faster than scalar code.

    Scalable SIMD ISAs are kind of a new thing, so not sure how well current std::simd or C vector extensions (or LLVM IR SIMD ops) map to the HW. Maybe they would be better served by another kind of API? I don't really know, haven't had the privilege of writing any scalable vector code yet.

    What I'm trying to say is IMO std::simd works well enough and should probably be stabilized (almost) as is, barring any show stopper issues. It's already useful and has been for many years.

  16. > Getting maximum performance out of SIMD requires rolling your own code with intrinsics

    Not disagreeing with this statement in general, but with std::simd I can get 80% of the performance with 20% of the effort compared to intrinsics.

    For the last 20%, there's a zero cost fallback to intrinsics when you need it.

  17. Despite all of these issues you mention, std::simd is perfectly usable in the state it is in today in nightly Rust.

    I've written thousands and thousands of lines of Rust SIMD code over the last ~4 years and it's, in my opinion, a pretty nice way of doing SIMD code that is portable.

    I don't know about the specific issues in stabilization, but the API has been relatively stable, although there were some breaking changes a few years ago.

    Maybe you can't extract 100% of your CPUs capabilities using it, but I don't find that a problem because there's a zero-cost fallback to CPU-specific intrinsics when necessary.

    I recently wrote some computer graphics code and I could get really nice performance (~20x my scalar code, 5x from just a naive translation). And the same codebase can be compiled to AVX2, SSE2 and ARM NEON. It uses f32x8's (256b vector width), which are not available on SSE or NEON, but the compiler can split those vectors. The f32x8 version was faster than f32x4 even on 128b hardware. I would've needed to painstakingly port this codebase to each CPU, so it was at least a 3x reduction in lines of code (and more in programmer time).

  18. In my experience, compiling C with -ffast-math will tremendously improve floating point autovectorization and optimizations to SIMD (C vector extensions, which are similar to Rust std::simd) code in general.

    This obviously has a lot of caveats, and should only be enabled on a per function or per file basis.

    Unfortunately Rust does not currently have options for adjusting per-function compiler optimization parameters. This is possible in some C compilers using function attributes.

  19. Two more from the world of analog music/guitar electronics:

    1) Ring modulator: https://en.wikipedia.org/wiki/Ring_modulation

    A device used to multiply two analog signals in time domain. Best known for the sound of the Daleks in the original 1960s Doctor Who series. Has some applications outside of music and sound effects. If you can find those old fashioned audio transformers, this effect does not require a power source.

    2) Diode clipper: https://en.wikipedia.org/wiki/Clipper_(electronics)

    Two diodes in parallel with opposite polarities. Clips the incoming AC signal to a +/- diode threshold voltage. Put a high voltage gain amplifier stage in front of it and you get the classic electric guitar distortion tone you know and love. Allegedly works best with germanium-unobtainium diodes. In their absence, using two different kinds of diodes can also have pleasant tonal qualities.

  20. > To kill the thread, set the stop flag and cond_signal the condvar

    This is a race condition. When you "spin" on a condition variable, the stop flag you check must be guarded by the same mutex you give to cond_wait.

    See this article for a thorough explanation:

    https://zeux.io/2024/03/23/condvars-atomic/

This user hasn’t submitted anything.