Comment by mitrevf - Hacker Neue

mitrevf Nov 19, 2023 parent

The codebase of an LLM is the size of a high school exam project. There is little to no coding in machine learning. That is the sole reason why they are overvalued - any company can write its own in a flash. You only require hardware to train and inference.

andy_ppp Nov 19, 2023

If it's so simple why does Chat GPT 4 perform better than almost everything else...

alfonsodev Nov 19, 2023

I think it's about having massive data pipelines and process to clean huge amounts of data, increasing signal noise ratio, and then scale as other are saying having enough gpu power to serve millions of users. When Stanford researchers trained Alpaca[1][2] the hack was to use GPT itself to generate the training data, if I'm not mistaken.

But with compromises, as it was like applying loose compression on an already compressed data set.

If any other organisation could invest the money in a high quality data pipeline then the results should be as good, at least that my understanding.

[1] https://crfm.stanford.edu/2023/03/13/alpaca.html [2] https://newatlas.com/technology/stanford-alpaca-cheap-gpt/

LeonM Nov 19, 2023

I'm not saying it is simple in any way, but I do think part of having a competitive edge in, AI at least at this moment, is having access to ML hardware (AKA: Nvidia silicon).

Adding more parameters tends to make the model better. With OpenAI having access to huge capital they can afford 'brute forcing' a better model. AFAIK right now OpenAI has the most compute power, which would partially explain why GPT4 yields better results than most of the competition.

Just having the hardware is not the whole story of course, there is absolutely a lot of innovation and expertise coming from oAI as well.

arthur_sav Nov 19, 2023

I'm sure Google and Microsoft have access to all the hardware they need. OpenAI is doing the best job out there.

Galanwe Nov 19, 2023

You're not really answering the question here.

Parent's point is that GPT-4 is better because they invested more money (was that ~$60M?) in training infrastructure, not because their core logic is more advanced.

I'm not arguing for one or the other, just restating parent's point.

andy_ppp Nov 19, 2023

Are you really saying Google can't spend $60m or much more to compete? Again if it is so easy as spending money on compute Amazon and Google would have just spent the money by now and Bard would be as good as Chat GPT, but for most things it is not even as good as Chat GPT 3.5.

pk-protect-ai Nov 19, 2023

You should already be aware of the secret sauce of ChatGPT by now: MoE + RLHF. Making MoE profitable is a different story. But, of course, that is not the only part. OpenAI does very obvious things to make GPT-4 and GPT-4 Turbo better than other models, and this is hidden in the training data. Some of these obvious things have already been discovered, but some of them we just can't see yet. However, if you see how close Phind V7 34B is to the quality of GPT-4, you'll understand that the gap is not wide enough to eliminate the competition.

jacquesm Nov 19, 2023

This is very much true. Competitive moats can be built on surprisingly small edges. I've built a tiny empire on top of a bug.

Cyberfit Nov 19, 2023

If they’re ”obvious”, e.g. ”easy to see”, how come, as you say, we ”can’t see” them yet?

Can not see ≠ easy to see

pk-protect-ai Nov 19, 2023

That is the point we often overlook the obvious stuff. It is something so simple and trivial that nobody sees it as a vital part. It is something along the lines of "Textbooks are all you need."

armcat Nov 19, 2023

The final codebase, yes. But ML is not like traditional software engineering. There is a 99% failure rate, so you are forgetting 100s of hours that go into: (1) surveying literature to find that one thing that will give you a boost in performance, (2) hundreds of notebooks in trying various experiments, (3) hundreds of tweaks and hacks with everything from data pre-processing, to fine-tuning and alignment, to tearing up flash attention, (4) beta and user testing, (5) making all this run efficiently on the underlying infra hardware - by means of distillation, quantization, and various other means, (6) actually pipelining all this into something that can be served at hyperscale

pk-protect-ai Nov 19, 2023

> you are forgetting 100s of hours

I would say thousands. Even for the hobby projects, - thousands of GPU hours and thousands of research hours a year.

karmasimida Nov 19, 2023

And some luck is needed really.

karmasimida Nov 19, 2023

Tell me you aren't in an LLM project without telling me.

Data and modeling is so much than just coding. I would wish it is like that, but it is not. The fact it renders this much similarity to alchemy is funny, but unfortunate.

levidos Nov 19, 2023

Do you have a link to one please?

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous