If you want to try what Karpathy is describing live today, here's a demo I wrote a few months ago: https://universal.oroborus.org/
It takes mouse clicks, sends them to the LLM, and asks it to render static HTML+CSS of the output frame. HTML+CSS is basically a JPEG here, the original implementation WAS JPEG but diffusion models can't do accurate enough text yet.
My conclusions from doing this project and interacting with the result were: if LLMs keep scaling in performance and cost, programming languages are going to fade away. The long-term future won't be LLMs writing code, it'll be LLMs doing direct computation.
It takes mouse clicks, sends them to the LLM, and asks it to render static HTML+CSS of the output frame. HTML+CSS is basically a JPEG here, the original implementation WAS JPEG but diffusion models can't do accurate enough text yet.
My conclusions from doing this project and interacting with the result were: if LLMs keep scaling in performance and cost, programming languages are going to fade away. The long-term future won't be LLMs writing code, it'll be LLMs doing direct computation.