I'm thinking more like pseudointellect over serial to attach a $3 esp32 to. Since it's basically tokens in, tokens out, let's just cut the unnecessary parts out. It's like querying the cloud models, except it's your silicon you personally soldered to the esp so nobody will break your home assistant with a system prompt update or a fine tuning run.
"General purpose CPUs are going to stay to become the little brain that orchestrates GPUs."
If that was going to happen, it would have happened.
CPUs are genuinely good at what they do, and "what they do" is a lot of tasks that GPUs are actually terrible at. If all we had were GPUs in the world and someone invented a CPU, we'd hail them as a genius. A lot of people seem to think that GPUs are just "better", just ambiently better at everything, but that's lightyears from the truth. They are quite spectacularly terrible at a lot of very common tasks. There's many very good reasons that GPUs are still treated as accelerators for the CPUs and not vice versa.
But I suspect parallel computing in GPU style is going to dominate acclerated computing.
General purpose CPUs are going to stay to become the little brain that orchestrates GPUs.
Ideas of software direct to hardware transition might never be the mainstream.