Imagine you have 1 million GPUs and you have 99% utilization of theoretical performance in the system with inferencing. That would mean 10k of GPUs are basically idle and draw power. You could now try to identify which ones are idle but you won't find them because utilization is a dynamic process so while all GPUs are under load not all are running 100% performance beause of interconnects and networking not providing data fast enough so your whole network becomes a bottleneck.
So what you need is a very smart routing process of computation requirements on the whole cluster. This is pure SW issue and not HW issue. This is the SW Nvidia has been working on for years and where AMD is years behing.
This is also why Jensen is absolutely right to say that competitors can offer their chips for free because Nvidia's key in TCO performance is the idea of one giant GPU so SW and networking allowing for highest utilization of a data center. You can't build a GPU the size of 1 million GPUs so you have to think of the utilization problem of a network of GPUs.
In the real world utilization rates are way below 100% so every % better of utilization is way more worth than the price of single GPUs. The idea here is that the company providing 2-3x higher utilization can easily ask for like 5x higher pricing per chip and will still deliver a better TCO.
From that perspective the notion that NVidia will own this AI future while others such as AMD and Intel standby, would be silly.
Im already surprised it took this long. The NVidia moat might he software, but not anything that warrants these kind of margins at this scale. It is likely there will be strong price competition on hardware for inference.
What makes you think? Or are all non Nvidia GPUs x86?
For inference that’s hardly relevant, though?
For training its not exactly insurmountable either.