And don't forget, TPUs are horrible at floating point math! The errors!
Yea I've been wondering about charts I've seen comparing tpu model quality perf to gpu model quality like here [1], whether that could be due to error correction. At the same time training on gaming gpus like 1080 ti or 2080 ti is widely popular, though they lack the ECC memory of the "professional" quadro cards or V100. I did think conventional DL wisdom said "precision doesn't matter" and "small errors don't matter" though.
I've noticed this difference in quality perf in my own experiments tpu vs gaming gpu, but don't know for sure what the cause is. I never did notice a difference between gaming gpu trained models and quadro trained modela. Have more info/links?
1: https://github.com/tensorflow/gan/tree/master/tensorflow_gan...
A full TPU v2/v3 can train models and use 16/32 bit floats. They also have a Google-specific (?) 16-bit floating point type with reduced precision.