HN isn't always very rational about voting. It will be a loss if you judge any idea on their basis.
> It would take a lot of work to make a GPU do current CPU type tasks
In my opinion, that would be counterproductive. The advantage of GPUs is that they have a large number of very simple GPU cores. Instead, just do a few separate CPU cores on the same die, or on a separate die. Or you could even have a forest of GPU cores with a few CPU cores interspersed among them - sort of like how modern FPGAs have logic tiles, memory tiles and CPU tiles spread out on it. I doubt it would be called a GPU at that point.
Also you'd need to add extra hardware for various OS support functions (privilege levels, address space translation/MMU) that are currently missing from the GPU. But the idea is otherwise sound, you can think of the 'Mill' proposed CPU architecture as one variety of it.
Perhaps I should have phrased it differently. CPU and GPU cores are designed for different types of loads. The rest of your comment seems similar to what I was imagining.
Still, I don't think that enhancing the GPU cores with CPU capabilities (OOE, rings, MMU, etc from your examples) is the best idea. You may end up with the advantages of neither and the disadvantages of both. I was suggesting that you could instead have a few dedicated CPU cores distributed among the numerous GPU cores. Finding the right balance of GPU to CPU cores may be the key to achieving the best performance on such a system.
> If you look at your typical phone or laptop SoC, the CPU is only a small part.
Keep in mind that the die area doesn't always correspond to the throughput (average rate) of the computations done on it. That area may be allocated for a higher computational bandwidth (peak rate) and lower latency. Or in other words, get the results of a large number of computations faster, even if it means that the circuits idle for the rest of the cycles. I don't know the situation on mobile SoCs with regards to those quantities.
In mobile SoCs a good chunk of this is power efficiency. On a battery-powered device, there's always going to be a tradeoff to spend die area making something like 4K video playback more power efficient, versus general purpose compute
Desktop-focussed SKUs are more liable to spend a metric ton of die area on bigger caches close to your compute.
As for how the HW looks like we already know. Look at Strix Halo as an example. We are just getting bigger and bigger integrated GPUs. Most of the flops on that chip is the GPU part.
Also, I'd say if you buy for example a Macbook with an M4 Pro chip, it is already is a big GPU attached to a small CPU.
It would take a lot of work to make a GPU do current CPU type tasks, but it would be interesting to see how it changes parallelism and our approach to logic in code.