Compiling LLMs into a MegaKernel: A path to low-latency inference

314 points Jun 19, 2025

It looks like you have JavaScript disabled. This web app requires that JavaScript is enabled. Please enable JavaScript to use this site (or just go read Hacker News).