Preferences

throw0101d parent
If your workload can't actually use the whole (NVidia) GPU, it is possible to slice it up so that it can be shared between multiple users:

* https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

* https://www.nvidia.com/en-us/technologies/multi-instance-gpu...

Or having multiple processes from one user share it:

* https://docs.nvidia.com/deploy/mps/index.html


jsheard
AIUI only on workstation/server cards though, it's one of the levers they pull to artificially segment their lineup.
oofbey
MIG virtualization is IMHO weak sauce. Only seven slices. Seven? Extremely limited hardware support. Difficult to configure - like the early days of CUDA. It’s been in the works for what 7 years now and barely functional.

Meanwhile, don’t forget that if your workloads are cooperative, you can put all the processes you want on a single GPU and they’ll happily multitask. No security boundary of course, but who knows how good MIG is at that.

I’d greatly prefer better tools for cooperative GPU sharing like per process memory limits or compute priority levels. Also seems like it should be way easier to implement. As containerization and k8 have proven, there’s a ton of utility in bin packing your own workloads better without rock solid security boundaries.

throw0101d OP
> MIG virtualization is IMHO weak sauce.

I know several HPC sites that use it: they (e.g.) ordered cookie-cutter server designs/models to simplify logistics, but not all of their users need the complete capabilities, and so they slice/dice some portion into smaller instances for smaller jobs.

E.g.:

* https://hpc.njit.edu/MIG/

* https://www.rc.virginia.edu/2025/07/hpc-maintenance-aug-12-2...

> Only seven slices. Seven?

At some point the slices because so small that they stop being useful. An A100 can have as 'little' as 40G of memory, and you're now down to 5G per instance:

* https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#a10...

> Extremely limited hardware support.

It's a reasonable argument that you'd only need it at the top-end of the hardware: the number of workloads that need all that compute and memory are not that common, so downshifting some hardware to resource slices that are more typical is not crazy. Of course you then upshift when needed: but if you had purchased 'smaller' cards because that's what you thought you (initially) needed, then you're stuck at that level. There's no way for you to upshift/de-downshift.

> Difficult to configure - like the early days of CUDA.

How hard is it to run nvidia-smi?

> Meanwhile, don’t forget that if your workloads are cooperative, you can put all the processes you want on a single GPU and they’ll happily multitask. No security boundary of course, but who knows how good MIG is at that.

The security boundary of MIG is lot better than MPS, which basically has no security. I know several folks running HPC clusters that use it to isolate the Slurm workloads of different users. And my search-fu has found no CVEs or published papers jailbreaking out of MIG instances.

> I’d greatly prefer better tools for cooperative GPU sharing like per process memory limits or compute priority levels. Also seems like it should be way easier to implement.

This is what MPS is for:

* https://docs.nvidia.com/deploy/mps/index.html

* https://man.archlinux.org/man/extra/nvidia-utils/nvidia-cuda...

shaklee3
Or green contexts
How real is the risk of information leakage if I’m on a shared GPU with multiple users?
throw0101d OP
Contra another comment: fairly low. (Or at least my search-fu has not been able to find any CVEs or published papers about breaking isolation between MIG instances. MPS should be generally be used only by one user so multiple of their own CUDA apps can attach to one (v)GPU.)

MIG is used a lot in HPC and multi-tenancy cloud, where isolation is important. See Figure 1 and §6.2:

* https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

The card is actually sliced into different instances (show up as different /dev/nvidiaXs), each with their own SMs, L2, and DRAM, that are isolated between each one. (MPS is for the same user to share a GPU instance: allows multiple CUDA apps to attach and time-slicing occurs.)

saagarjha
Is anyone actually looking at this platform?
throw0101d OP
> Is anyone actually looking at this platform?

Question unclear: looking at to use (yes: lots in HPC, hypervisors), or looking at from a security POV (don't know)?

saagarjha
Yeah I'm talking about the latter
LPisGood
I remember a few years ago my hardware security professor suggested we try to implement Rowhammer on GPU. I ended up doing something else, but it looks like someone got there: https://arxiv.org/abs/2507.08166
doctorpangloss
MIG is low, the exploit would be exotic.

MPS should only be used where all the workloads trust each other. It is similar to running multiple games on your computer simultaneously.

You cannot use NVLink with MPS or MIG, it is not isolated, and malformed NVLink messages can be authored in userspace and can crash the whole GPU. Some vendors, like Modal, allow you to request NVLink'd shared GPUs anyway.

MIG only makes sense for cloud providers. MPS only makes sense for interactive (read: not ML) workloads. Workloads needing more than 1 GPU cannot use either.

woadwarrior01
throw0101d OP
I do not see MIG mentioned in either paper. I do not think the papers are examining isolation security between instances, which the GP was asking about.
woadwarrior01
Yeah, I only posted two links from my notes, from when I was looking at this a few months ago. Here's one on MIG.

https://arxiv.org/abs/2207.11428

throw0101d OP
As per sibling comment, this is about utilization efficiency and not breaking isolation (between MIG instances). The conclusion:

> In this paper, we presented MISO, a technique to leverage the MIG functionality on NVIDIA A100 GPUs to dynamically partition GPU resources among co-located jobs. MISO deploys a learning-based method to quickly find the optimal MIG partition for a given job mix running in MPS. MISO is evaluated using a variety of deep learning workloads and achieves an average job completion time that is lower than the unpartitioned GPU scheme by 49% and is within 10% of the Oracle technique.

stygiansonic
That paper doesn’t seem to be about security vulnerabilities in MiG but rather using it to improve workload efficiency

This item has no comments currently.