And before you bring up the “efficiency” of the Mac: I’ve done the math, and between the Mac being much slower (thus needing more time to run) and the fact that you can throttle the discrete GPUs to use 200-250W each and only lose a few percent in LLM performance, it’s the same price or cheaper to operate the discrete GPUs for the same workload.
My point was not that “it isn’t really that slow,” my point is that Macs are slower than dedicated GPUs, while being just as expensive (or more expensive, given the specific scenario) to purchase and operate.
And I did my analysis using the Mac Studio, which is faster than the equivalent MBP at load (and is also not portable). So if you’re using a MacBook, my guess is that your performance/watt numbers are worse than what I was looking at.
Ultra is about 2X of the power of a Max, but the Max itself is pretty beefy, and it has more than enough GPU power for the models that you can fit into ~48GB of RAM (what you have available if you are running with 64GB of memory).
In pretty much any other situation, using dedicated GPUs is 1) definitely faster, like 2x the speed or more depending on your use case, and 2) the same cost or possibly cheaper. That’s all I’m saying.