Preferences

Nvidia (NVDA) generates revenue with hardware, but digs moats with software.

The CUDA moat is widely unappreciated and misunderstood. Dethroning Nvidia demands more than SOTA hardware.

OpenAI, Meta, Google, AWS, AMD, and others have long failed to eliminate the Nvidia tax.

Without diving into the gory details, the simple proof is that billions were spent on inference last year by some of the most sophisticated technology companies in the world.

They had the talent and the incentive to migrate, but didn't.

In particular, OpenAI spent $4 billion, 33% more than on training, yet still ran on NVDA. Google owns leading chips and leading models, and could offer the tech talent to facilitate migrations, yet still cannot cross the CUDA moat and convince many inference customers to switch.

People are desperate to quit their NVDA-tine addiction, but they can't for now.

[Edited to include Google, even though Google owns the chips and the models; h/t @onlyrealcuzzo]


The CUDA moat is largely irrelevant for inference. The code needed for inference is small enough that there are e.g. bare-metal CPU only implementations. That isn't what's limiting people from moving fully off Nvidia for inference. And you'll note almost "everyone" in this game are in the process of developing their own chips.
My company recently switched from A100s to MI300s. I can confidently say that in my line of work, there is no CUDA moat. Onboarding took about month, but afterwards everything was fine.
Alternatives exist, especially for mature and simple models. The point isn't that Nvidia has 100% market share, but rather that they command the most lucrative segment and none of these big spenders have found a way to quit their Nvidia addiction, despite concerted efforts to do so.

For instance, we experimented with AWS Inferentia briefly, but the value prop wasn't sufficient even for ~2022 computer vision models.

The calculus is even worse for SOTA LLMs.

The more you need to eke out performance gains and ship quickly, the more you depend on CUDA and the deeper the moat becomes.

llm inference is fine on rocm. llama.cpp and vllm both have very good rocm support.

llm training is also mostly fine. I have not encountered any issues yet.

most of the cuda moat comes from people who are repeating what they heard 5-10 years ago.

> OpenAI, Meta, AWS, AMD, and others have long attempted to eliminate the Nvidia tax, yet failed.

Gemini / Google runs and trains on TPUs.

You have no incentive to infer on AMD if you need to buy a massive Nvidia cluster to train.

Meta trains on Nvidia and infers on AMD. There is incentive if your inference costs are high.
Meta also has a second generation of their own AI accelerator chips designed.
Google was omitted because they own the hardware and the models, but in retrospect, they represent a proof point nearly as compelling as OpenAI. Thanks for the comment.

Google has leading models operating on leading hardware, backed by sophisticated tech talent who could facilitate migrations, yet Google still cannot leap over the CUDA moat and capture meaningful inference market share.

Yes, training plays a crucial role. This is where companies get shoehorned into the CUDA ecosystem, but if CUDA were not so intertwined with performance and reliability, customers could theoretically switch after training.

> yet Google still cannot leap over the CUDA moat and capture meaningful inference market share.

It's almost as if being a first-mover is more important than whether or not you use CUDA.

Both matter quite a bit. The first-mover advantage obviously rewards OEMs in a first-come, first-serve order, but CUDA itself isn't some light switch that OEMs can flick and get working overnight. Everyone would do it if it was easy, and even Google is struggling to find buy-in for their TPU pods and frameworks.

Short-term value has been dependent on how well Nvidia has responded to burgeoning demands. Long-term value is going to be predicated on the number of Nvidia alternatives that exist, and right now the number is still zero.

Google has a self inflicted wound in the time to get an api key.
The fact that this comment is DOWNVOTED despite being literally 1000% true is evidence that HN is full of loonies.
It's unclear why this drew downvotes, but to reiterate, the comment merely highlights historical facts about the CUDA moat and deliberately refrains from assertions about NVDA's long-term prospects or that the CUDA moat is unbreachable.

With mature models and minimal CUDA dependencies, migration can be justified, but this does not describe most of the LLM inference market today nor in the past.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal