Comment by angoragoats

angoragoats Dec 4, 2025 parent

Hard disagree on it working well for local AI - all the memory bandwidth in the world doesn’t matter when the GPU it’s connected to is middling in performance compared to dedicated options. Give me one (or several) 3090/4090/5090 any day of the week over a Mac.

seanmcdirmid Dec 4, 2025

I’ve got an M3 Max with 64G, and can run larger models well than a single 5090. Yes, the GPU isn’t as fast, but I have a lot more memory and my GPUs still don’t suck that badly.

angoragoats OP Dec 4, 2025

You illustrated my point exactly: yes, a single 32GB 5090 has half the memory of your Mac. But two of them (or three 3090/4090s) have the same total memory as your Mac, are in the same ballpark in price, and would be several times faster at running the same model as your Mac.

And before you bring up the “efficiency” of the Mac: I’ve done the math, and between the Mac being much slower (thus needing more time to run) and the fact that you can throttle the discrete GPUs to use 200-250W each and only lose a few percent in LLM performance, it’s the same price or cheaper to operate the discrete GPUs for the same workload.

seanmcdirmid Dec 4, 2025

I don't know. Can you bring your GPUs on an inter-continental plane trip and play with LLMs on the plane? It isn't really that slow for 70B 4-q models. These are very good CPU/GPUs, and they are only getting better.

angoragoats OP Dec 5, 2025

Sure, the GPUs sit in my basement and I can connect to them from anywhere in the world.

My point was not that “it isn’t really that slow,” my point is that Macs are slower than dedicated GPUs, while being just as expensive (or more expensive, given the specific scenario) to purchase and operate.

And I did my analysis using the Mac Studio, which is faster than the equivalent MBP at load (and is also not portable). So if you’re using a MacBook, my guess is that your performance/watt numbers are worse than what I was looking at.

seanmcdirmid Dec 5, 2025

The whole point of having it local is not to use the network, or not need it, or not needing to jump the GFW when you are in China.

Ultra is about 2X of the power of a Max, but the Max itself is pretty beefy, and it has more than enough GPU power for the models that you can fit into ~48GB of RAM (what you have available if you are running with 64GB of memory).

angoragoats OP Dec 5, 2025

If you travel to China, sure, what I’m talking about probably won’t work for you.

In pretty much any other situation, using dedicated GPUs is 1) definitely faster, like 2x the speed or more depending on your use case, and 2) the same cost or possibly cheaper. That’s all I’m saying.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous