This point seems under appreciated by the AGI proponents. If one of our models suddenly has a brainwave and becomes generally intelligent, it would realize that it is awash in a morass of contradictory facts. It would be more than the sum of its training data. The fact that all models at present credulously accept their training suggests to me that we aren’t even close to AGI.
In the short term I think two things will happen: 1) we will live with the reduced usefulness of models trained on data that has been poisoned, and 2) the best model developers will continue to work hard to curate good data. A colleague at Amazon recently told me that curation and post hoc supervised tweaks (fine tuning, etc) are now major expenses for the best models. His prediction was that this expense will drive out the smaller players in the next few years.
Is this true?
So many on HN make these absolute statements about how LLMs operate and what they can and can't do, that it seems like they fail harder at this test than any other.
It is just autocomplete.
They can't generalize.
They can't do anything not in their training set.
All of which are false.
This is the entirety of human history, humans create this data, we sink ourselves into it. It's wishful thinking that it would change.
> 2) the best model developers will continue to work hard to curate good data.
Im not sure that this matters much.
Leave these problems in place and you end up with an untrustworthy system, one where skill and diligence become differentiators... Step back from the hope of AI and you get amazing ML tooling that can 10x the most proficient operators.
> supervised tweaks (fine tuning, etc) are now major expenses for the best models. His prediction was that this expense will drive out the smaller players in the next few years.
This kills more refined AI. It is the same problem that killed "expert systems" where the cost of maintaining them and keeping them current was higher than the value they created.
Lets take something that has been in the news recently: https://abcnews.go.com/Business/wireStory/investors-snap-gro...
"Nearly 27% of all homes sold in the first three months of the year were bought by investors -- the highest share in at least five years, according to a report by real estate data provider BatchData."
That sounds like a lot... and people are rage baited into yelling about housing and how it's unaffordable. They point their fingers at corporations.
If you go look at the real report it paints a different picture: https://investorpulse1h25.batchdata.io/?mf_ct_campaign=grayt... -- and one that is woefully incomplete because of how the data is aggregated.
Ultimately all that information is pointless because the real underlying trend has been unmovable for 40 something years: https://fred.stlouisfed.org/series/RSAHORUSQ156S
> every time they unthinkingly repeat propaganda
How do you separate propaganda from perspective, facts from feelings? People are already bad at this, the machines were already well soiled by the data from humans. Truth, in an objective form, is rare and often even it can change.