Preferences

fancyfredbot parent
Usually the reason you'd want the network bandwidth would be for distributed training.

For inference you can probably get by with 2GB/s assuming you can split the layers up nicely.

The interconnect can be a bottleneck for inference but only for networks with loads of activations and large batch sizes, or if you are doing tensor level parallelism.


This item has no comments currently.