Comment by fancyfredbot

fancyfredbot Mar 20, 2025 parent

Usually the reason you'd want the network bandwidth would be for distributed training.

For inference you can probably get by with 2GB/s assuming you can split the layers up nicely.

The interconnect can be a bottleneck for inference but only for networks with loads of activations and large batch sizes, or if you are doing tensor level parallelism.

This item has no comments currently.