Comment by zhihaojia - Hacker Neue

zhihaojia Jun 19, 2025 parent

Thanks for reading the post and github README. Supporting training is definitely feasible but the benefit may not be as significant as low-latency inference since training generally involves much larger kernels, making kernel launch overhead less significant.

Thanks for sharing the FlashDMoE work. Our next step is to support MoE models. Stay tuned!

bytepoet Jun 20, 2025

Thanks for the inputs. It's very helpful to know.

I look forward to following mirage development.

This item has no comments currently.