Of course deepseek was forced to take the optimisation approach but got to the end in time to stake a claim. So ymmv.
The real bitter lesson in AI is that we don't really know what we're doing. We're hacking on models looking for architectures that train well but we don't fully understand why they work. Because we don't fully understand it, we can't design anything optimal or know how good a solution can possibly get.
Well, technically, that's not true: The entire idea behind complexity theory is that there are some tasks that you can't throw more hardware at - at least not for interesting problem sizes or remotely feasible amounts of hardware.
I wonder if we'll reach a similar situation in AI where "throw more context/layers/training data at the problem" won't help anymore and people will be forced to care more about understanding again.
More precisely, I think producing a good fast merge of ca 5 lists was a problem I didn’t have good answers for but maybe I was too fixated on a streaming solution and didn’t apply enough tricks.
Maybe the hope is that you won't have to manually map the universal algorithm to your specific problem and can just train the transformer to figure it out instead, but there are few proofs that transformers can solve all problems in some complexity class through training instead of manual construction.
Also, solution testing is mandatory. Luckily, you can ask an RNG for that, too, as long as you have tests for the testers already written.