Comment by imtringued

imtringued May 28, 2025 parent

Google does everything, both inference and training, on their TPUs.

Inference is easier, since the person deploying a model knows the architecture ahead of time and therefore can write custom code for their particular model.

When training you want to be as flexible as possible. The framework and hardware should not impose any particular architecture. This means lots of kernels and combinations of kernels. Miss one and you're out.

throwawaymaths May 28, 2025

> Miss one and you're out.

well these days since everything is transformer, your pool of choices is less daunting and theres only about four or five places that someone might get clever.

This item has no comments currently.