Preferences

Phi4-mini runs on a basic laptop CPU at 20T/s… how is that slow? Without optimization…

dist-epoch
I was running Qwen3-32B locally even faster, 70T/s, still way too slow for me. I'm generating thousands of tokens of output per request (not coding), running locally I could get 6 mil tokens per day and pay electricity, or I can get more tokens per day from Google Gemini 2.5 Flash for free.

Running models locally is a privilege for the rich and those with too much disposable time.

This item has no comments currently.