Preferences

Very bold direction and I love it. Looks like a lot of CUDA expertise engineering. I am thinking why set batch size to 1? Hope to see comparison with real production with larger batch size. Also wondering how to extend it to other models, like MOE, expert parallel, CUDA kernel is not supported across GPUs?

saagarjha
Because people using it for interactive use use batch size 1

This item has no comments currently.