“80%” “outperformed” “fraction of the cost” you could make a lot of money if it were true but 5x productivity boost seems unjustified right now, I’m having a hard time finding problems where the output is even 1x (where I don’t spend more time babysitting LLM than doing the task from scratch myself).
Depends what you're doing.
For "stay in your lane" stuff, I agree, it relatively sucks.
For "today I need do stuff two lanes over", well it still needs the babysitting, and I still wouldn't put it on tasks where I can't verify the output, but it definitely delivers a productivity boost IME.
Sorry you're downvoted, but I generally agree. When it comes to software, most organizations are Initech.
My lived experience the software industry at almost all levels over the last 25 years leads me to believe that the vast majority of humans and teams of humans produce atrocious code that only wastes time, money, and people's patience.
Often because it is humans producing the code, other humans are not willing to fully engage, criticize and improve that code, deferring to just passing it on to the next person, team, generation, whatever.
Yes, this perhaps happens better in some (very large and very small) organizations, but most often it only happens with the inclusions of horrendous layers of protocol, bureaucracy, more time, more emotional exhaustion, etc.
In other words a very costly process to produce excellent code, both in real capital and human capital. It literally burns through actual humans and results in very bad health outcomes for most people in the industry, ranging from minor stuff to really major things.
The reality is that probably 80% of people working in the tech industry can be outperformed by an AI and at a fraction of the cost. AIs can be tuned, guided, and steered to produce code that I would call exception compared even to most developers who have been in the field for 5years or more.
You probably come to this fallacy because you have worked in one of these very small or very large companies that takes producing code seriously and believe that your experience represents the vast majority of the industry, but in fact the middle area is where most code is being "produced" and if you've never been fully engaged in those situations, you may literally have no idea of the crap that's being produced and shipped on a daily basis. These companies have no incentive to change, they make lots of money doing this, and fresh meat (humans) is relatively easy to come by.
Most of these AI benchmarks are trying to get these LLMs to produce outputs at the scale and quantity of one of these exceptional organizations when in fact, the real benefits will come in the bulk of organizations that cannot do this stuff and AI will produce as good or better code than a team of mediocre developers slogging away in a mediocre, but profitable, company.
Yes there are higher levels of abstraction around code, and getting it deployed, comprehensive testing, triaging issues, QA blah blah, that humans are going to be better at for now, but I see many of those issues being addressed by some kind of LLM system sooner or later.
Finally, I think most of the friction people are seeing right now in their organization is because of the wildly ad hoc way people and organizations are using AI, not so much about the technological abilities of the models themselves.