> Are there specific use cases where o1's higher cost is justified anymore?
Long tail stuff perhaps. Most stuff doesn't resemble a programming benchmark. A newer model thrives despite being small when there is a lot of training data, and with programming benchmarks, like with chess, there is a lot of training data, in part because high quality training data can be synthesized.
Long tail stuff perhaps. Most stuff doesn't resemble a programming benchmark. A newer model thrives despite being small when there is a lot of training data, and with programming benchmarks, like with chess, there is a lot of training data, in part because high quality training data can be synthesized.