crmi parent
I've got a working theory that models perform differently when used in different timezones... As in during US working hours they dont work as well due to high load.
When used at 'offpeak' hours not only are they (obviously) snappier but the outputs appear to be a higher standard. Thought this for a while but now noticing with Claude4 [thinking] recently. Textbook case of anecdata of course though.
Interesting thought, if nothing less. Unless I misunderstand, it would be easy to run a study to see if this is true; use the API to send the same but slightly different prompt (as to avoid the caches) which has a definite answer, then run that once per hour for a week and see if the accuracy oscillates or not.