Preferences

Four-bit floats are not as useful as Nvidia would have you believe. Like structured sparsity it's mainly a trick to make newer-gen cards look faster in the absence of an improvement in the underlying tech. If you're using it for NN inference you have to carefully tune the weights to get good accuracy and it offers nothing over fixed-point.

imtringued
The actual problem is that nobody uses these low precision floats for training their models. When you do quantization you are merely compressing the weights to minimize memory usage and to use memory bandwidth more efficiently. You still have to run the model at the original precision for the calculations so nobody gives a damn about the low precision floats for now.
Y_Y OP
That's not entirely true. Current-gen Nvidia hardware can use fp8 and newly announced Blackwell can do fp4. Lots of existing specialized inference hardware uses int8 and some int4.

You're right that low-precison training still doesn't seem to work, presumably because you lose the smoothness required for SGD-type optimization.

This item has no comments currently.