Because the Even Bitterer Lesson is that The Bitter Lesson is true but not actionable. You still have to build the inefficient ”clever” system today because The Bitter Lesson only tells you that your system will be obliterated, it doesn’t tell you when. Some systems built today will last for years, others will last for weeks, others will be obsoleted before release, and we don’t know which are which.
I’m hoping someday that dude releases an essay called The Cold Comfort. But it’s impossible to predict when or who it will help, so don’t wait for it.
Yeah I get it. I just don't like that is always sorta framed as a can't win don't try message.
The principal of optimal slack tells you that if your training will take N months on current computing hardware that you should go spend Y months at the beach before buying the computer, and you will complete your task in better than N-Y months thanks to improvements in computing power.
Of course, instead of the beach one could spend those Y months improving the algorithms... but it's never wise to bid against yourself if you don't have to.
A colloquially is that to maximize your beach time you should work on the biggest N possible, neatly explaining the popularity of AI startups.
There's some domains where the bitter lesson has big impacts on theory. The Peter Norvig vs Noam Chomsky debate on the merits of brute force compute finding a full and complete theory of language is an example. That's a case where the path of "get a ton of data and handle it statistically" competes with the path of "build a complete and abstract understanding of the domain." Lots of resources and lifetimes of work are decided by which path to take.
Agreed that overfitting the bitter lesson often leads slopping piles of compute and hardware at problems that could just be deterministic.
The solution to the puzzle is that "the bitter lesson" is about AI software systems, not arbitrary software systems. If you're writing a compiler, you're better off worrying about algorithms, etc. AI problems have an inherent vagueness to them that makes it hard to write explicit rules, and any explicit rules you write will end up being obsolete as soon as we have more compute.
This is all explained in the original essay: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Maybe if you have infinite compute you don't worry about software design. Meanwhile in the real world...
Not only that but where did all these compute optimized solutions come from? Oh yeah millions of man hours of optimizing and testing algorithmic solutions. So unless you are some head in the clouds tenured professor just keep on doing your optimizations and job as usual.