surfmike parent
How would you describe it instead? Curious and learning
Google does everything, both inference and training, on their TPUs.
Inference is easier, since the person deploying a model knows the architecture ahead of time and therefore can write custom code for their particular model.
When training you want to be as flexible as possible. The framework and hardware should not impose any particular architecture. This means lots of kernels and combinations of kernels. Miss one and you're out.