Comment by torginus - Hacker Neue

torginus Aug 8, 2025 parent

My personal sneaking suspicion is that publicly offered models are using way less compute than thought. In modern mixture of experts models, you can do top-k sampling, where only some experts are evaluated, meaning even SOTA models aren't using much more compute than a 70-80b non-MoE model.

This item has no comments currently.