Preferences

This world model talk is interesting, and Yann Lecunn has broached on the same topic, but the fact is there are video diffusion models that are quite good at representing the "video world" and even counterfactually and temporally coherently generating a representation of that "world" under different perturbations.

In fact you can go to a SOTA LLM today, and it will do quite well at predicting the outcomes of basic counterfactual scenarios.

Animal brains such as our own have evolved to compress information about our world to aide in survival. LLMs and recent diffusion/conditional flow matching models have been quite successful in compressing the "text world" and the "pixel world" to score good loss metrics on training data.

It's incredibly difficult to compress information without have at least some internal model of that information. Whether that model is a "world model" that fits the definition of folks like Sutton and LeCunn is semantic.


Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal