commakozzi parent
It does feel like it has to be real. I've noticed it since chatGPT with GPT-3.5, once it hit big news publicly and demands were made on "censoring" its output to limit biases, etc. (not inherently a problem to do this with LLMs as a society, but it does affect the output for obvious reasons). Whatever workflow OpenAI and others have applied, seems to be post-release somehow? i'm ignorant and just speculating, but literally every model release i've noticed it. Starts strong, ends up feeling less capable days, weeks, months after. I'm sure some of it could be in the parallelization of processing that has to occur to service the large amount of requests. and more and more traffic are spreading it thin?
> I'm sure some of it could be in the parallelization of processing that has to occur to service the large amount of requests. and more and more traffic are spreading it thin?
Even if this is the case, benchmarks should be done at scale too if the models suffer from symptoms of scale. Otherwise the benchmarks are just a lie unless you have access to an unconstrained version of the model.