Preferences

zhihaojia parent
Thanks for reading the post and github README. Supporting training is definitely feasible but the benefit may not be as significant as low-latency inference since training generally involves much larger kernels, making kernel launch overhead less significant.

Thanks for sharing the FlashDMoE work. Our next step is to support MoE models. Stay tuned!


bytepoet
Thanks for the inputs. It's very helpful to know.

I look forward to following mirage development.

This item has no comments currently.