I know several HPC sites that use it: they (e.g.) ordered cookie-cutter server designs/models to simplify logistics, but not all of their users need the complete capabilities, and so they slice/dice some portion into smaller instances for smaller jobs.
At some point the slices because so small that they stop being useful. An A100 can have as 'little' as 40G of memory, and you're now down to 5G per instance:
It's a reasonable argument that you'd only need it at the top-end of the hardware: the number of workloads that need all that compute and memory are not that common, so downshifting some hardware to resource slices that are more typical is not crazy. Of course you then upshift when needed: but if you had purchased 'smaller' cards because that's what you thought you (initially) needed, then you're stuck at that level. There's no way for you to upshift/de-downshift.
> Difficult to configure - like the early days of CUDA.
How hard is it to run nvidia-smi?
> Meanwhile, don’t forget that if your workloads are cooperative, you can put all the processes you want on a single GPU and they’ll happily multitask. No security boundary of course, but who knows how good MIG is at that.
The security boundary of MIG is lot better than MPS, which basically has no security. I know several folks running HPC clusters that use it to isolate the Slurm workloads of different users. And my search-fu has found no CVEs or published papers jailbreaking out of MIG instances.
> I’d greatly prefer better tools for cooperative GPU sharing like per process memory limits or compute priority levels. Also seems like it should be way easier to implement.
I know several HPC sites that use it: they (e.g.) ordered cookie-cutter server designs/models to simplify logistics, but not all of their users need the complete capabilities, and so they slice/dice some portion into smaller instances for smaller jobs.
E.g.:
* https://hpc.njit.edu/MIG/
* https://www.rc.virginia.edu/2025/07/hpc-maintenance-aug-12-2...
> Only seven slices. Seven?
At some point the slices because so small that they stop being useful. An A100 can have as 'little' as 40G of memory, and you're now down to 5G per instance:
* https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#a10...
> Extremely limited hardware support.
It's a reasonable argument that you'd only need it at the top-end of the hardware: the number of workloads that need all that compute and memory are not that common, so downshifting some hardware to resource slices that are more typical is not crazy. Of course you then upshift when needed: but if you had purchased 'smaller' cards because that's what you thought you (initially) needed, then you're stuck at that level. There's no way for you to upshift/de-downshift.
> Difficult to configure - like the early days of CUDA.
How hard is it to run nvidia-smi?
> Meanwhile, don’t forget that if your workloads are cooperative, you can put all the processes you want on a single GPU and they’ll happily multitask. No security boundary of course, but who knows how good MIG is at that.
The security boundary of MIG is lot better than MPS, which basically has no security. I know several folks running HPC clusters that use it to isolate the Slurm workloads of different users. And my search-fu has found no CVEs or published papers jailbreaking out of MIG instances.
> I’d greatly prefer better tools for cooperative GPU sharing like per process memory limits or compute priority levels. Also seems like it should be way easier to implement.
This is what MPS is for:
* https://docs.nvidia.com/deploy/mps/index.html
* https://man.archlinux.org/man/extra/nvidia-utils/nvidia-cuda...