Preferences

Note that while UMA is great in the sense that they allow LLM models to be run at all, M-series chips aren't faster[1] when the model fits in VRAM.

  1: screenshot from[2]: https://www.igorslab.de/wp-content/uploads/2023/06/Apple-M2-ULtra-SoC-Geekbench-5-OpenCL-Compute.jpg
  2: https://wccftech.com/apple-m2-ultra-soc-isnt-faster-than-amd-intel-last-year-desktop-cpus-50-slower-than-nvidia-rtx-4080/

The problem is you're limited to 24 GB of VRAM unless you pay through the nose for datacenter GPUs, whereas you can get an M-series chip with 128 GB or 192 GB of unified memory.
Surely! The point is that they're not million times faster magic chips that makes NVIDIA bankrupt tomorrow. That's all. A laptop with up to 128GB "VRAM" is a great option, absolutely no doubt about that.
They are powerful, but I agree with you, it's nice to be able to run Goliath locally, but it's a lot slower than my 4070.
that's openCL compute, LLM models ideally should be hitting the neural accelerator, not running on generalized gpu compute shaders.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal