Comment by xixihaha - Hacker Neue

xixihaha May 30, 2025 parent

Very bold direction and I love it. Looks like a lot of CUDA expertise engineering. I am thinking why set batch size to 1? Hope to see comparison with real production with larger batch size. Also wondering how to extend it to other models, like MOE, expert parallel, CUDA kernel is not supported across GPUs?

saagarjha Jun 7, 2025

Because people using it for interactive use use batch size 1

This item has no comments currently.