That breakthrough was only 6 years ago!
https://openai.com/index/better-language-models/
> We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text...
That was big news. I guess this is because it's quite hard for the most people to distinguish the enormous difficulty gulf between "generate a coherent paragraph" and "create a novel funny joke".
- It can play chess -> but not at a serious level
- It can beat most people -> but not grandmasters
- It can beat grandmasters -> but it can’t play go
…etc, etc
In a way I guess it’s good that there is always some reason the current version isn’t “really” impressive, as it drives innovation.
But as someone more interested in a holistic understanding of of the world than proving any particular point, it is frustrating to see the goalposts moved without even acknowledging how much work and progress were involved in meeting the goalposts at their previous location.
Half the HN front page for the past years has been nothing but acknowledging the progress of LLMs in sundry ways. I wish we actually stopped for a second. It’s all people seem to want to talk about anymore.
Goes to show that "bad at jokes" is not a fundamental issue of LLMs, and that there are still performance gains from increasing model scale, as expected. But not exactly the same performance gains you get from reasoning or RLVR.