Preferences

It's surprising to me that the field is willing to invest this much in mega-kernels, but not models that generate multiple tokens in parallel...

liuliu
It is hard to justify tens-of-millions investment in training to just make it faster without any idea how it scores on benchmarks. It is easier to justify keeping the model intact and spend extra millions to make it faster with exotic means (megakernels).

There are some niche research on parallel token generations though as of late...

This item has no comments currently.