> I'm sure some of it could be in the parallelization of processing that has to occur to service the large amount of requests. and more and more traffic are spreading it thin?
Even if this is the case, benchmarks should be done at scale too if the models suffer from symptoms of scale. Otherwise the benchmarks are just a lie unless you have access to an unconstrained version of the model.
Even if this is the case, benchmarks should be done at scale too if the models suffer from symptoms of scale. Otherwise the benchmarks are just a lie unless you have access to an unconstrained version of the model.