Preferences

refibrillator parent
Hi author(s), the on-GPU interpreter approach looks like a promising path forward, have you seen this strikingly similar concurrent work?

https://www.hackerneue.com/item?id=44111673

I find it curious that fundamentals of the CUDA programming model (eg kernel launches) are being subverted in favor of fine grained task based parallelism that ends up using the hardware more effectively. Makes me wonder if CUDA has been holding us back in some ways.

What are the chances we see your work land in PyTorch as an experimental backend?

Awesome stuff thanks for sharing.

P.S. minor typo, your first two paragraphs under part 1 are nearly identical.


zhihaojia
Thanks for the great feedback! Stanford's MegaKernel project tackles a similar challenge but focuses on manual CUDA implementation. While MPK takes a compiler-driven approach—users express their LLMs at the PyTorch level, and MPK automatically compiles them into optimized megakernels. Our goal is to make programming megakernels much more accessible.

I completely agree that CUDA can be a limiting factor, especially for latency-sensitive workloads. As GPUs are becoming larger and faster, it's increasingly difficult to write standalone kernels that fully utilize hardware resources—particularly when optimizing for low latency with small batch sizes.

> What are the chances we see your work land in PyTorch as an experimental backend?

We're definitely excited about that direction. We believe MPK can help PyTorch support megakernel generation, and we’re actively exploring how to make that happen. Stay tuned!

> P.S. minor typo, your first two paragraphs under part 1 are nearly identical.

Thanks for pointing it out--I meant to remove the duplicate paragraph when finalizing the post.

pavelstoev
Hi Author - thank you very much for the clear and relatively easy-to-understand MPK overview. Could you please also comment on the similarity of your project to Hidet https://pytorch.org/blog/introducing-hidet/

Thank you !

This item has no comments currently.