Comment by pinewurst - Hacker Neue

pinewurst Oct 17, 2020 parent

Google is pretty invested in TPUs for their own workloads but I fail to see any durable encouragement of them as an external product. At best they're there to encourage standalone development of applications/frameworks to be deployed on Google Cloud (IMHO of course).

tachyonbeam Oct 17, 2020

AFAIK, apart from toy dev boards like this, you can't buy a TPU, you can only rent access to them in the cloud. I wouldn't want my company to rely on that. What if Google decides to lock you out? If you've adapted your workload to rely on TPUs, you'd be fucked.

akiselev Oct 17, 2020

What's the difference between Coral's production line of Edge TPU modules and chips [1] and Google's cloud TPU offering?

Note: I haven't tried sourcing these in production (100k+) quantities so I have no idea what guarantees that product line gives customers.

[1] https://coral.ai/products/#production-products

usmannk Oct 17, 2020

They're nothing alike at all. Similar to how a low end laptop GPU differs from a top of the line NVIDIA datacenter offering. Google's cloud TPU offering is the strongest ML training hardware that exists, the edge devices simply support the same API.

debbiedowner Oct 18, 2020

Edge tpu is 2 tflops at half precision, cloud tpu starts at 140 tflops single precision and scales further.

Also edge tpu is 2-5Watts. Supposedly cloud tpus are more power efficient than GPUs, and for eg the 14 tflops 2080 ran at 300 W regularly.

popinman322 Oct 17, 2020

Coral can only run inference, and is optimized for models using 8-bit integers (via quantization).

A full TPU v2/v3 can train models and use 16/32 bit floats. They also have a Google-specific (?) 16-bit floating point type with reduced precision.

kordlessagain Oct 17, 2020

And don't forget, TPUs are horrible at floating point math! The errors!

debbiedowner Oct 18, 2020

Yea I've been wondering about charts I've seen comparing tpu model quality perf to gpu model quality like here [1], whether that could be due to error correction. At the same time training on gaming gpus like 1080 ti or 2080 ti is widely popular, though they lack the ECC memory of the "professional" quadro cards or V100. I did think conventional DL wisdom said "precision doesn't matter" and "small errors don't matter" though.

I've noticed this difference in quality perf in my own experiments tpu vs gaming gpu, but don't know for sure what the cause is. I never did notice a difference between gaming gpu trained models and quadro trained modela. Have more info/links?

1: https://github.com/tensorflow/gan/tree/master/tensorflow_gan...

This item has no comments currently.