I think SLM is developing very fast. A year ago, I couldn't have imagined a decent thinking model as Qwen, and now it seems full of promise
You're still missing the point. The comment you're responding to is talking about specialized models
The point is still valid. If the big companies could save money running multiple small specialised models on cheap hardware, they wouldn't be spending billions on the highest spec GPUs.
SOTA models are larger than what can be run locally, though.
Obviously we'd all like to see smaller models perform better, but there's no reason to believe that there's a hidden secret to making small, locally-runnable models perform at the same level as Claude and OpenAI SOTA models. If there was, Anthropic and OpenAI would be doing it.
There's research happening and progress being made at every model size.