Comment by suprjami - Hacker Neue

suprjami Nov 10, 2025 parent

Now we can see why they avoided giving a straight answer.

File this one in the blue folder like the DGX

stogot Nov 11, 2025

Noob here. Why is that number bad?

TomatoCo Nov 11, 2025

LLM performance depends on doing a lot of math on a lot of different numbers. For example, if your model has 8 billion parameters, and each parameter is one byte, then for 256gb/s you can't do better than 32 tokens per second. So if you try to load a model that's 80 gigs, you only get 3.2 tokens per second, which is kinda bad for something that costs 3-4k.

There's newer models called "Mixture of Experts" that are, say, 120b parameters, but only use 5b parameters per token (the specific parameters are chosen via a much smaller routing model). That is the kind of model that excels on this machine. Unfortunately again, those models work really well when doing hybrid inference, because the GPU can handle the small-but-computationally-complex fully connected layers while the CPU can handle the large-but-computationally-easy expert layers.

This product doesn't really have a niche for inference. For training and prototyping is another story, but I'm a noob on those topics.

abtinf Nov 11, 2025

My mac laptop has 400gb/s bandwidth. LLMs are bandwidth bound.

kennethallen Nov 11, 2025

Running LLMs will be slow and training them is basically out of the question. You can get a Framework Desktop with similar bandwidth for less than a third of the price of this thing (though that isn't NVIDIA).

embedding-shape Nov 11, 2025

> Running LLMs will be slow and training them is basically out of the question

I think it's the reverse, the use case for these boxes are basically training and fine-tuning, not inference.

kennethallen Nov 12, 2025

The use case for these boxes is a local NVIDIA development platform before you do your actual training run on your A100 cluster.

NaomiLehman Nov 11, 2025

refurbished macbooks m1 for $1,500 have more with less latency

This item has no comments currently.