The output is very problematic. It breaks itself all the time, makes the same mistakes multiple times, I have to retread my steps. I’m going to have it write tests so it can better tell what it’s breaking.
But being able to say “take this GTK app and add a web server and browser based mode” and it just kinda does it with minimal manual debugging is something remarkable. I don’t fully understand it, it is a new capability. I do robotics and I wish we had this for PCB design and mechanical CAD, but those will take much longer to solve. Still, I am eager to point Claude at my hand written python robotics stack from my last major project [1] and have it clean up and document what was a years long chaotic prototyping process with results I was reasonably happy with.
The current systems have flaws but if you look at where LLMs were five years ago and you see the potential value in fixing the flaws with agentic coding, it is easy to imagine that those flaws will be addressed. There will be higher level flaws and those will eventually be addressed, etc. Maybe not, but I’m quite curious to see where this goes, and what it means for engineering as a human being at these times.
[1] https://github.com/sequoia-hope/acorn-precision-farming-rove...
Implementation plans, intermediate bot/hunan review to control for complexity, convention adherence, actual task completion, then provide guidance, manage the context and a ton of other things to cage and harness the agent.
Then, what it produces, it almost passes the sniff test. Add further bot and human code review, and we've got something that passes muster.
The siren song of "just do it/fix it" is hard to avoid sometimes, especially as deadlines loom, but that way lies pain. Not a problem for a quick prototype or something throwaway (and OP is right, that that works at all is nothing short of marvelous), but to create output to be used in long term maintainable software a lot has to happen, and even that it's sometimes a crap shoot.
But why not do it by hand then? To me it still accelerates and opens up the possibility space tremendously.
Overall I'm bullish on agents improving past the current necessary operator-driven handholding sooner than later. Right now we have the largest collection of developer agent RL data ever, all the labs sucking up that juicy dev data. I think that will improve today's tooling tremendously.
I have no doubt that agents will become meaningfully useful for some things at some point. This just hasn't really happened yet, aside of the really simple stuff perhaps.
The shitty code it comes up with helps me a lot, because fixing broken stuff and unraveling the first principles of why it's broken is how I learn best. It helps me think and stay focused. When learning new areas, the goal is to grasp the subject matter enough to find out what's wrong with the generated code - and it's still a pretty safe bet that it will be wrong, one way or another. Whenever I attempt to outsource the actual thinking (because of feeling lazy or even just to check the abilities of the model), the results are pretty bad and absolutely nowhere near the quality level of anything I'd want to sign my name under.
Of course, some people don't mind and end up wasting other people's time with their generated patches. It's not that hard to find them around. I have higher standards than replying "dunno, that's what AI wrote" when a reviewer asks why is something done this particular way. Agentic tools bring down the walls which could let you stop for a moment and notice the sloppiness of that output even further. They just let the model do more of the things it's not very good at, and encourage it to flood the code with yet another workaround for an issue that would disappear completely had you spent two minutes pondering about the root cause (which you won't do, because you don't have a mental model of what the code does, because you let the agent do it for you).
I use Claude Code with a dozen different hand tuned subagent specs and a comprehensive CLAUDE.md specifying how to use them. I review every line of code before committing (turning off the auto commit was the very first instruction). It is now the case that it is able to make a full PR that needs no major changes. Often just one or two follow up tweak requests.
With subagents it can sometimes be running for an hour or more before it is done, but I don’t have to babysit it anymore.