Profile: petu - Hacker Neue

petu

Joined Nov 7, 2025 6 karma

petu Dec 1, 2025

I think your idea of MoE is incorrect. Despite the name they're not "expert" at anything in particular, used experts change more or less on each token -- so swapping them into VRAM is not viable, they just get executed on CPU (llama.cpp).
petu Nov 7, 2025

By using ~3 bit quantized model with llama.cpp, Unsloth makes good quants:
https://docs.unsloth.ai/models/tutorials-how-to-fine-tune-an...
Note that llama.cpp doesn't try to be production-grade engine, more focused on local usage.

This user hasn’t submitted anything.

Preferences