Not anywhere close to that.
Those 350k GPUs you talk about aren't linked together. They also definitely aren't all H100s.
To train a GPT-4 scale model you need a single cluster, where all the GPUs are tightly linked together. At the scale of 20k+ GPUs, the price you pay in networking to link those GPUs is basically almost the same as the price of those GPUs themselves. It's really hard and expensive to do.
FB has maybe 2 such clusters, not more than that. And I'm somewhat confident one of those cluster is an A100 cluster.
So they can train maybe 6 GPT-4 every 90 days.
340,000 H100s 600,000 H100 equivalents (perhaps AMD Instinct cards?) On top of the hundreds of thousands of legacy A100s.
And I'm certain the order for B100s will be big. Very big.
Even the philanthropic org Chan-Zuckerberg institute current rocks 1000 H100s, probably none used for inference.
They are going ALL OUT
Just like they did for their metaverse play, and that didn't work out very well.
This was even true in Star Trek. People could do literally anything on a holodeck and the writers still had them going to Risa for a holiday.
There is no chance of VR going mainstream until someone solves the fundamental human problem of people preferring to do things in real life.
Sounds like more unsubstantiated hype from a company desperate to sell a product that was very expensive to build. I guess we'll see, but I'm not optimistic for them.
What do they use them for?
And then someone else starts giving away shovels for free.
Ah, I see -- it's more like a "level 2 gold rush".
So a level 1 gold rush is: There's some gold in the ground, nobody knows where it is, so loads of people buy random bits of land for the chance to get rich. Most people lose, a handful of people win big. But the retailers buying shovels at wholesale and selling them at a premium make a safe, tidy profit.
But now that so many people know the maxim, "In a gold rush, sell shovels", there's now a level 2 gold rush: A rush to serve the miners rushing to find the gold. So loads of retailers buy loads and loads of shovels and set up shop in various places, hoping the miners will come. Probably some miners will come, and perhaps those retailers will make a profit; but not nearly as much as they expect, because there's guaranteed to be competition. But the company making the shovels and selling them at a premium makes a tidy profit.
So NVIDIA in this story is the manufacturer selling shovels to retailers; and all the companies building out massive GPU clouds are the retailers rushing to serve miners. NVIDIA is guaranteed to make a healthy profit off the GPU cloud rush as long as they play their cards right (and they've always done a pretty decent job of that in the past); but the vast majority of those rushing to build GPU clouds are going to lose their shirts.
And their business model is shovel-fleet logistics and maintenance... :p
Have we been living in the same universe the last 10 years? I don't see this ever happening. Related recent news (literally posted yesterday) https://www.axios.com/2024/07/02/chevron-scotus-biden-cyber-...
Red state blue collar workers got their candidate to pass tariffs. What happens when both blue state white collar workers and red state blue collar workers need to contest with AI. Perhaps not within the next 10 years, but certainly within 20 years!
And if you think 20 years is a long time... 2004 was when Halo 2 came out
I don't know what power you imagine SWEs and PhDs posses, but the last time their employers flexed their power by firing them in droves (despite record profits); the employees sure seemed powerless, and society shrugged it off and/or expressed barely-concealed schadenfreude.
That settlement favored Apple, Google and the other conspirators because they only paid out a fraction of what they would have paid in salaries absent the collusion - so the settlement was not exactly a show of force by the engineers. Additionally, this was after a judge had thrown out a lower settlement amount the lawyers representing the class had agreed to.
But agreed, between the unions with political pull and "AI safety" grifters I suspect there could be some level of regulatory risk, particularly for the megacorps in California. I doubt it will be some national thing in the US absent a major political upheaval. Definitely possible in the EU which will probably just be a price passed on to customers or reduced access, but that's nothing new for them.
Tucker Carlson at one point said if FSD was going to take away trucking jobs we should stop that with regulation.
But in the general sense, I think it's tautologically correct to say better models always lead to better predictions, which always give an edge in competitions on an individual or societal level. So long term I do believe learning trumps ignorance, not in all cases but on average.
> What happens when both blue state white collar workers and red state blue collar workers need to contest with AI. Perhaps not within the next 10 years, but certainly within 20 years!
Populism. Probably the fascist right-wing kind, but I expect some form of populism. Related, if we're talking about a 20 year time horizon, I'm genuinely unsure if society will still exist in any recognizable fashion at the rate we're going...
But there's bigger fish to fry for American politics and worker obsolescence is not really top of mind for anyone.
The same thinking stopped many legacy tech companies from becoming a “cloud” company ~20 years ago.
Fast forward to today and the margin for cloud compute is still obscene. And they all wish in hindsight they got into the cloud business way sooner than they ultimately did.
What ended up happening was Amazon was better at scale and lockin than everyone else. They gave Netflix a sweet deal and used it as a massive advertisement. It ended up being a rock rolling down a hill and all the competitors except ones with very deep pockets and the ability to cross-subsidize from other businesses (MSFT and Google) got crushed.
I thought Nvidia recently took that crown recently though.
And then you come to companies that managed to streamline both and ran out of floor space in their data center because they had to hold onto assets for 3-5 years. At one previous employer, the smallest orderable unit of compute was a 44U rack. They eventually filled the primary data center and then it took them 2 years to Tetris their way out of it.
There is a hypothetical "but what if we honestly actually really really do", but that's such a waste of engineering time when there are so many other problems to be solved that it's implausible. The only time multi-cloud makes sense is when you have to meet customers where they're at, and have resources in whichever cloud your customers are using. Or if you're running arbitrage between the clouds and are reselling compute.
> Plus you have to pay for your own control plane when that’s already baked into the cloud provider’s charge model.
When you say "control plane" does this mean Kubernetes?> I think it was because we were working on Reels. We always want to have enough capacity to build something that we can't quite see on the horizon yet. ... So let's order enough GPUs to do what we need to do on Reels and ranking content and feed. But let's also double that.
So there's an immense capacity inside Meta, but the _whole_ fleet isn't available for LLM training.
[0]: https://www.dwarkeshpatel.com/p/mark-zuckerberg?open=false#§...
What IS a huge problem is the almost complete lack of systematically acquired quantitative data on human health (and diseases) for a very large number (1 million subjects) of diverse humans WITH multiple deep-tissue biopsies (yes, essentially impossible) that srr suitable for multiomics at many ages/stages and across many environments. (Note, we can do this using mice.)
Some specific examples/questions to drive this point home: What is the largest study of mRNA expression in humans? ANSWER: The small but very expensive NUH GTEx study (n max of about 1000 Americans). This study acquired postmortem biopsies for just over 50 tissues. And what is the largest study of protein expression in humans across tissues? Oh sorry, this has never been done although we know proteins are the work-horses of life. What about lipids, metabolites, metagenomics, epigenomics? Sorry again, there is no systematically acquired data at all.
What we have instead is a very large cottage-industry of lab-level studies that are structurally incoherent.
Some brag about the massive biomedical data we have, but it is truly a ghost and most real data evaporates with a few years.
Here is my rant on fundamental data design flaws and fundamental data integration flaws in biomedical research:
Herding Cats: The Sociology of Data Integration https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2751652/
But I also think the GP's claim and yours are not incompatible. I wonder how much survivorship bias this has since it only considers those that are able to do research, and not those that would have but ended up doing continuing with another STEM job. We could be asking the counterfactual that I think the GP is implying: would more people have been interested in becoming cancer researchers if publications were open?
We can sort of see the effect because we have scihub now, which basically unlocks journal access for those that are comfortable with it, and I consider it plausibly having a significant effect for the population that have a research background without an academic affiliation. I've met a few biotech startup founders that switched from tech to bio and did self study+scihub outside of the university. The impetus for change I've heard a few times is a loved one got X disease, and I studied it, quit my less impactful tech job to work on bio stuff.
> Think of how cancer could have been cured a decade ago if information was allowed to flow freely from the 50's forward
might be a bit fanciful? Unless you're referring to something particular I'm unaware of.
The people best equipped and trained to deliver a cure for cancer (and then some, since it tends not to be particularly field-restricred) do have access.
I think the loss is more likely in engineering (to the publication's science), cheaper methods, more reliably manufacturable versions of lab prototypes, etc.
I doubt there are many people capable of cancer research breakthroughs who don't have access to cancer research, personally.
(And to be clear: I'm not capable of it.)
The schools I’ve worked with have access to everything I’ve needed. They didn’t advertise it but it’s also free for students.
2) There may be a few researchers who don't have unfettered access. Perhaps they paid $40 for a copy of a paper. Given the high cost of other parts of research labs, I find it hard to believe that any real possibility of curing cancer was halted because someone had to pay $40.
3) It's possible to imagine the opposite being the case. Perhaps someone had a key insight in a clever paper and decided to distribute it for free out of some info-anarchistic impulses. There it would sit in some FTP directory uncrated, unindexed and uncared for. Perhaps the right eyes would find it. Perhaps they wouldn't. Perhaps the cancer researcher would be able to handle all of the LaTeX and FTP chores without slowing down research. Perhaps they would be distracted by sys admin headaches and never make a crucial follow up discovery.
The copyrighted journal system provides curation and organization. Is it wonderful? Nah. Is it better than some ad hoc collection of FTP directories? Yes!
Your opinion may be that this scenario would never happen. In my opinion, this is more likely than your vision.
[0] https://en.wikipedia.org/wiki/Slavery_Abolition_Act_1833
The precisions and mantissa/exponent ratios you want for inference are just different to a mixed-precision, fault tolerant, model and data parallel pipeline.
Hopper is for training mega-huge attention decoders: TF32, bfloat16, hot paths to the SRAM end of the cache hierarchy with cache coherency semantics that you can reason about. Parity gear for fault tolerance, it’s just a different game.
If there's dedicated inferencing silicon (like say the thing created by Groq), all those GPUs will be power sucking liabilities, and then the REAL singularity superintelligence level training can begin.
Maybe. But we've barely scratched the surface of being more economical with data.
I remember back in the old days, there was lots of work on eg dropout and data augmentation etc. We haven't seen too much of that with the like of ChatGPT yet.
I'm also curious to see what the future of multimodal models holds: you can create almost arbitrarily amounts of extra data by pointing a webcam at the world, especially when combined with a robot, or letting your models also play StarCraft or Diplomacy against each other.
What it actually means is that they are training next gen models that are 50X larger.
And, considering MS and OpenAI are planning to build a $100 billion AI training computer, these 350K GPUs is just a tiny portion of what they are planning.
This isn't an overkill. This is the current plan: throw as much compute as possible and hope intelligence scales with compute.
Could you expand on this? Who are "the builders" here? You mean the model developers? I don't see how this situation can be "amazing" for the builders - developers will just get a wage out of their work.
I agree though that the returns on hardware rapidly diminish.
The US Supreme Court seems determined to make sure that big regulatory hammers are not going to be dropping, from what I can tell.
When NAR settled the price collusion charge? Thus cartel or not, times do change.
A good real estate agent can guide people through this process while advising them on selling at the right price while avoiding the most stress often during an extremely difficult time in their life, such as going through divorce of breakup. They of course also help keep buyers interested while the seller is making up their mind about the correct offer to take.
I find your comment ignorant in so many ways. Maybe have some respect?
It takes a long time for cultures to shift and for people to start to trust information systems to entirely replace high touch stuff like that. And at some level there will always be some white glove service on top for special cases.
https://hai.stanford.edu/news/ai-trial-legal-models-hallucin...
Largely depends on how much money the client has.
Human lawyers fail by not being very zealous and most of them being very average, not having enough time to spend on any filings, and not having sufficient research skills. So really, depth-of-knowledge and talent. They generally won't get things wrong per se, but just won't find a good answer.
AI gets it wrong by just making up whole cases that it wishes existed to match the arguments it came up with, or that you are hinting that you want, perhaps subconsciously. AI just wants to "please you" and creates something to fit. Its depth-of-knowledge is unreal, its "talent" is unreal, but it has to be checked over.
It's the same arguments with AI computer code. I had AI create some amazing functions last night but it kept hallucinating the name of a method call that didn't exist. Luckily with code it's more obvious to spot an error like that because it simply won't compile, and in this case I got luckier than usual, in that the correct function did exist under another name.
Human imperfections are a family of failure-modes which have a gajillion years of experience in detecting, analyzing, preventing, and repairing. Quirks in ML models... not so much.
A quick thought-experiment to illustrate the difference: Imagine there's a self-driving car that is exactly half as likely to cause death or injury than a human driver. That's a good failure rate. The twist is that its major failure mode is totally alien, where units attempt to inexplicably chase-murder random pedestrians. It would be difficult to get people to accept that tradeoff.
It's one thing if a human makes a wrong financial decision or a wrong driving decision, it's another thing if a model distributed to ten million computers in the world makes that decision five million times in one second before you can notice it's happening.
It's why if your coworker makes a weird noise you ask what's wrong, if the industrial furnace you stand next to makes a weird noise you take a few steps back.
Meta has about 350,000 of these GPUs and a whole bunch of A100s. This means the ability to train 50 GPT-4 scale models every 90 days or 200 such models per year.
This level of overkill suggests to me that the core models will be commoditized to oblivion, making the actual profit margins from AI-centric companies close to 0, especially if Microsoft and Meta keep giving away these models for free.
This is actually terrible for investors, but amazing for builders (ironically).
The real value methinks is actually over the control of proprietary data used for training which is the single most important factor for model output quality. And this is actually as much an issue for copyright lawyers rather than software engineers once the big regulatory hammers start dropping to protect American workers.