Preferences

This is a good resource. But for the computer vision and machine learning practitioner most of the fun can start where this article ends.

nvcc from the CUDA toolkit has a compatibility range with the underlying host compilers like gcc. If you install a newer CUDA toolkit on an older machine, likely you'll need to upgrade your compiler toolchain as well, and fix the paths.

While orchestration in many (research) projects happens from Python, some depend on building CUDA extensions. An innocently looking Python project may not ship the compiled kernels and may require a CUDA toolkit to work correctly. Some package management solutions provide the ability to install CUDA toolkits (conda/mamba, pixi), the pure-Python ones do not (pip, uv). This leaves you to match the correct CUDA toolkit to your Python environment for a project. conda specifically provides different channels (default/nvidia/pytorch/conda-forge), from conda 4.6 defaulting to a strict channel priority, meaning "if a name exists in a higher-priority channel, lower ones aren't considered". The default strict priority can make your requirements unsatisfiable, even though there would be a version of each required package in the collection of channels. uv is neat and fast and awesome, but leaves you alone in dealing with the CUDA toolkit.

Also, code that compiles with older CUDA toolkit versions may not compile with newer CUDA toolkit versions. Newer hardware may require a CUDA toolkit version that is newer than what the project maintainer intended. PyTorch ships with a specific CUDA runtime version. If you have additional code in your project that also is using CUDA extensions, you need to match the CUDA runtime version of your installed PyTorch for it to work. Trying to bring up a project from a couple of years ago to run on latest hardware may thus blow up on you on multiple fronts.


> nvcc from the CUDA toolkit has a compatibility range with the underlying host compilers like gcc. If you install a newer CUDA toolkit on an older machine, likely you'll need to upgrade your compiler toolchain as well, and fix the paths.

Conversely, nvcc often stops working with major upgrades of gcc/clang. Fun times, indeed.

This is why a lot of people just use NVIDIA's containers even for local solo dev. It's a hassle to set up initially (docker/podman hell) but all the tools are there and they work fine.

> This is why a lot of people just use NVIDIA's containers even for local solo dev. It's a hassle to set up initially (docker/podman hell) but all the tools are there and they work fine.

Yeah, which I feel like is fine for one project, or one-offs, but once you've accumulated projects, having individual 30GB images for each of them quickly adds up.

I found that most of my issues went away as I started migrating everything to `ux` for the python stuff, and nix for everything system related. Now I can finally go back to a 1 year old ML project, and be sure it'll run like before, and projects share a bit more data.

What trouble have you had specifically? On both Win and Linux, installing the CUDA toolkit (e.g. v13) just works for me. My use case is compiling kernels (or cuFFT FFI) using nvcc for FFI in rust programs and libs.
Yep, right now nvidia libs are broken with clang-21 and recent glibc due to stuff like rsqrt() having throw() in the declaration and not in the definition
> Also, code that compiles with older CUDA toolkit versions may not compile with newer CUDA toolkit versions. Newer hardware may require a CUDA toolkit version that is newer than what the project maintainer intended.

This is the part I find confusing, especially as NVIDIA doesn't make it easy to find and download the old toolkits. Is this effectively saying that just choosing the right --arch and --code flags isn't enough to support older versions? But that as it statically links in the runtime library (by default) that newer toolkits may produce code that just won't run on older drivers? In other words, is it true that to support old hardware you need to download and use old CUDA Toolkits, regardless of nvcc flags? (And to support newer hardware you may need to compile with newer toolkits).

That's how I read it, which seems unfortunate.

Yes, this is the actual lived reality. Thank you for outlining it so well.
Sounds like most of these problems come from using Python.
You imply these problems would go away (or wouldn't be replaced by new ones) with another language.
Removing layers usually improves stability.

This item has no comments currently.