Comment by skydhash - Hacker Neue

skydhash 2 days ago parent

When I see proposals for this kind of workflow, the one question I have is how will you manage your personal context. When I’m reviewing code by coworker, I’m not seeking to fully understand the code or checking that it’s correct. I’m mostly trying to get a high level understanding and checking for glaring mistakes (code styles, best practices,…). I can get through a lot of PR in a day that way.

For more important stuff, like if it falls under my supervision, I will test the branch and carefully check the implementation. And this for each PR updates. That takes a lot longer.

So I’m wondering, how do you context switch between many agent running and proposing diffs. Especially if you need to vet the changes. And how do you manage module dependencies where an update by one task can subtly influence the implementation by another?

LeafItAlone 2 days ago

>So I’m wondering, how do you context switch between many agent running and proposing diffs. Especially if you need to vet the changes.

I’m wondering this too. But from what I have seen, I think most people doing this are not really reading and vetting the output. Just faster, parallelized, vibe coding.

Not saying that’s what parent is doing, but it’s common.

stingraycharles 2 days ago

Yeah. I would like multiple agents because each can be primed with a different system prompt and “clean” context. This has been proven to work, eg with Aider’s “architect” vs “editor” models / agents working together.

For parallel work who want stuff to “happen faster”, I am convinced most of these people don’t really read (nor probably understand) the code it produces.

scuol 2 days ago

It's basically like having N of the most prolific LoC producing colleagues who don't have a great mental model of how the language works that you have to carefully parse all of their PRs.

Honestly, I've seen too many fairly glaring mistakes in all models I've tried that signal that they can't even get the easy stuff right consistently. In the language I use most (C++), if they can't do that, how can I trust them to get all the very subtle things right? (e.g. very often they produce code that holds some form of dangling references, and when I say "hey don't do that", they go back to something very inefficient like copying things all over the place).

I am very grateful they can churn out a comprehensive test suite in gtest though and write other scripts to test / do a release and such. The relief in tedium there is welcome for sure!

jbentley1 2 days ago

I tried to make it easy to remember what you are doing. You can see the prompts you ran, and I used the Monaco editor from VSCode to view and edit the diffs.

I think there are opportunities to give special handling to the markdown docs and diagrams Claude likes to make a long the way to help review.

EGreg 2 days ago

Why don’t you automate this checking with AI? You can then cover hundreds of PRs a day.

Voloskaya 2 days ago

> You can then cover hundreds of PRs a day.

I would argue you haven't covered any.

Why not just skip the reviews then? If you can trust the models to have the necessary intelligence and context to properly review, they should be able to properly code in the first place. Obviously not where models are at today.

EGreg 2 days ago

Not necessarily. It's like the Generative Adversarial Network (GAN). You don't just trust the generator, but it's a back-and-forth between the Generator and Discriminator.

Voloskaya 2 days ago

The discriminator is trained on a different objective than the generator, it's specifically trained on being good at discriminating, so it is complimentary.

Here we are talking about the same model doing the review (even if you use a different model provider, it's still trained on essentially the same data, with the same objective and very similar performances).

We have had agentic systems where one agent checks the work of another since 2+ years, this isn't a paradigm pushed by AI coding model providers because it doesn't really work that well, review is still needed.

derwiki 2 days ago

Turtles all the way down. We seem to be marching towards a future like that, but are we there today? Some of the AI-generated PRs I’ve seen teammates put out “work” (because sometimes two wrongs make a right) but convince me we still need a human in the loop.

But that was two weeks ago; maybe it’s different today

jbentley1 2 days ago

The other replies are correct that right now you need some level of human review, but it would be interesting to have a second AI review with a clean context. Maybe a security checklist, or a prompt telling it to check that the tests are covering the functionality appropriately.

This item has no comments currently.