Comment by UncleOxidant

UncleOxidant Dec 4, 2025 parent

Or maybe models that are much more task-focused? Like models that are trained on just math & coding?

agoodusername63 Dec 4, 2025

isn't that what the mixture of experts trick that all the big players do is? Bunch of smaller, tightly focused models

irthomasthomas Dec 5, 2025

Not exactly. MoE uses a router model to select a subset of layers per token. This makes them faster but still requires the same amount of RAM.

This item has no comments currently.

It looks like you have JavaScript disabled. This web app requires that JavaScript is enabled. Please enable JavaScript to use this site (or just go read Hacker News).

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous