Preferences

SequoiaHope parent
I’m finding agentic coding to be a fascinating tool. The output is a mess but it takes so little input to make something quite functional. I had an app that I wrote with a python GUI framework I didn’t quite like. ChatGPT rewrote it to use GTK and it is so much faster now. Later Claude added a browser mode where the app can be run via GTK or a browser tab. I have never written a GTK app in my life past some hello world text box.

The output is very problematic. It breaks itself all the time, makes the same mistakes multiple times, I have to retread my steps. I’m going to have it write tests so it can better tell what it’s breaking.

But being able to say “take this GTK app and add a web server and browser based mode” and it just kinda does it with minimal manual debugging is something remarkable. I don’t fully understand it, it is a new capability. I do robotics and I wish we had this for PCB design and mechanical CAD, but those will take much longer to solve. Still, I am eager to point Claude at my hand written python robotics stack from my last major project [1] and have it clean up and document what was a years long chaotic prototyping process with results I was reasonably happy with.

The current systems have flaws but if you look at where LLMs were five years ago and you see the potential value in fixing the flaws with agentic coding, it is easy to imagine that those flaws will be addressed. There will be higher level flaws and those will eventually be addressed, etc. Maybe not, but I’m quite curious to see where this goes, and what it means for engineering as a human being at these times.

[1] https://github.com/sequoia-hope/acorn-precision-farming-rove...


seba_dos1
It is fascinating and it absolutely excels at writing barely-working, problematic code, that yet somehow appears to run. This helps me a lot, as having a shitty code to fix makes my mind much more engaged than when I'm writing stuff from scratch, but making the model do more stuff autonomously rather than having me consciously review it at each step is only making it less useful, not more.
danielbln
I've noticed that the quality of the output can be improved dramatically, but it takes a lot of... work isn't the right word, prior knowledge, persistence and systems, maybe.

Implementation plans, intermediate bot/hunan review to control for complexity, convention adherence, actual task completion, then provide guidance, manage the context and a ton of other things to cage and harness the agent.

Then, what it produces, it almost passes the sniff test. Add further bot and human code review, and we've got something that passes muster.

The siren song of "just do it/fix it" is hard to avoid sometimes, especially as deadlines loom, but that way lies pain. Not a problem for a quick prototype or something throwaway (and OP is right, that that works at all is nothing short of marvelous), but to create output to be used in long term maintainable software a lot has to happen, and even that it's sometimes a crap shoot.

But why not do it by hand then? To me it still accelerates and opens up the possibility space tremendously.

Overall I'm bullish on agents improving past the current necessary operator-driven handholding sooner than later. Right now we have the largest collection of developer agent RL data ever, all the labs sucking up that juicy dev data. I think that will improve today's tooling tremendously.

seba_dos1
Yes, it requires prior understanding of what you're attempting to do. That's more or less what I meant by "making your own brain work more" - if you treat it as an input for your brain to operate on and exercise your knowledge, it can boost your productivity. If you treat it as a tool that lets you think less, you end up with nothing but slop. Sometimes even slop will be useful, but the contexts where this is true are limited.

I have no doubt that agents will become meaningfully useful for some things at some point. This just hasn't really happened yet, aside of the really simple stuff perhaps.

This item has no comments currently.