Preferences

The codebase of an LLM is the size of a high school exam project. There is little to no coding in machine learning. That is the sole reason why they are overvalued - any company can write its own in a flash. You only require hardware to train and inference.

If it's so simple why does Chat GPT 4 perform better than almost everything else...
I think it's about having massive data pipelines and process to clean huge amounts of data, increasing signal noise ratio, and then scale as other are saying having enough gpu power to serve millions of users. When Stanford researchers trained Alpaca[1][2] the hack was to use GPT itself to generate the training data, if I'm not mistaken.

But with compromises, as it was like applying loose compression on an already compressed data set.

If any other organisation could invest the money in a high quality data pipeline then the results should be as good, at least that my understanding.

[1] https://crfm.stanford.edu/2023/03/13/alpaca.html [2] https://newatlas.com/technology/stanford-alpaca-cheap-gpt/

I'm not saying it is simple in any way, but I do think part of having a competitive edge in, AI at least at this moment, is having access to ML hardware (AKA: Nvidia silicon).

Adding more parameters tends to make the model better. With OpenAI having access to huge capital they can afford 'brute forcing' a better model. AFAIK right now OpenAI has the most compute power, which would partially explain why GPT4 yields better results than most of the competition.

Just having the hardware is not the whole story of course, there is absolutely a lot of innovation and expertise coming from oAI as well.

I'm sure Google and Microsoft have access to all the hardware they need. OpenAI is doing the best job out there.
You're not really answering the question here.

Parent's point is that GPT-4 is better because they invested more money (was that ~$60M?) in training infrastructure, not because their core logic is more advanced.

I'm not arguing for one or the other, just restating parent's point.

Are you really saying Google can't spend $60m or much more to compete? Again if it is so easy as spending money on compute Amazon and Google would have just spent the money by now and Bard would be as good as Chat GPT, but for most things it is not even as good as Chat GPT 3.5.
You should already be aware of the secret sauce of ChatGPT by now: MoE + RLHF. Making MoE profitable is a different story. But, of course, that is not the only part. OpenAI does very obvious things to make GPT-4 and GPT-4 Turbo better than other models, and this is hidden in the training data. Some of these obvious things have already been discovered, but some of them we just can't see yet. However, if you see how close Phind V7 34B is to the quality of GPT-4, you'll understand that the gap is not wide enough to eliminate the competition.
This is very much true. Competitive moats can be built on surprisingly small edges. I've built a tiny empire on top of a bug.
If they’re ”obvious”, e.g. ”easy to see”, how come, as you say, we ”can’t see” them yet?

Can not see ≠ easy to see

That is the point we often overlook the obvious stuff. It is something so simple and trivial that nobody sees it as a vital part. It is something along the lines of "Textbooks are all you need."
The final codebase, yes. But ML is not like traditional software engineering. There is a 99% failure rate, so you are forgetting 100s of hours that go into: (1) surveying literature to find that one thing that will give you a boost in performance, (2) hundreds of notebooks in trying various experiments, (3) hundreds of tweaks and hacks with everything from data pre-processing, to fine-tuning and alignment, to tearing up flash attention, (4) beta and user testing, (5) making all this run efficiently on the underlying infra hardware - by means of distillation, quantization, and various other means, (6) actually pipelining all this into something that can be served at hyperscale
> you are forgetting 100s of hours

I would say thousands. Even for the hobby projects, - thousands of GPU hours and thousands of research hours a year.

And some luck is needed really.
Tell me you aren't in an LLM project without telling me.

Data and modeling is so much than just coding. I would wish it is like that, but it is not. The fact it renders this much similarity to alchemy is funny, but unfortunate.

Do you have a link to one please?

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal