numpad0 parent
Note that while UMA is great in the sense that they allow LLM models to be run at all, M-series chips aren't faster[1] when the model fits in VRAM.
The problem is you're limited to 24 GB of VRAM unless you pay through the nose for datacenter GPUs, whereas you can get an M-series chip with 128 GB or 192 GB of unified memory.
Surely! The point is that they're not million times faster magic chips that makes NVIDIA bankrupt tomorrow. That's all. A laptop with up to 128GB "VRAM" is a great option, absolutely no doubt about that.
They are powerful, but I agree with you, it's nice to be able to run Goliath locally, but it's a lot slower than my 4070.
that's openCL compute, LLM models ideally should be hitting the neural accelerator, not running on generalized gpu compute shaders.