I'm sure an ML accelerator that doesn't support training will be great for applications like mass-produced self-driving cars. But for hobbyists - the kind of people who care about the difference between a $170 dev board and a $100 dev board - being unable to train is a pretty glaring omission.
Assuming that ratio holds, you'd maybe get 231 GFLOPs for training. The Nvidia GTX 9800 that I bought in 2008 gets 432 GFLOPs according to a quick Google search.
Hobbyists don't care about power efficiency for training, so buy any GPU made in the last 12 years instead, train on your desktop, and transfer the trained model to the board.
See this paper for an example of interactive RL: https://arxiv.org/abs/1807.00412
Note: I haven't tried sourcing these in production (100k+) quantities so I have no idea what guarantees that product line gives customers.
Also edge tpu is 2-5Watts. Supposedly cloud tpus are more power efficient than GPUs, and for eg the 14 tflops 2080 ran at 300 W regularly.
A full TPU v2/v3 can train models and use 16/32 bit floats. They also have a Google-specific (?) 16-bit floating point type with reduced precision.
I've noticed this difference in quality perf in my own experiments tpu vs gaming gpu, but don't know for sure what the cause is. I never did notice a difference between gaming gpu trained models and quadro trained modela. Have more info/links?
1: https://github.com/tensorflow/gan/tree/master/tensorflow_gan...
https://blog.usejournal.com/google-coral-edge-tpu-vs-nvidia-...
...and Google is pretty invested in TPUs, since it uses lots of them in house.
https://en.wikipedia.org/wiki/Tensor_Processing_Unit