For more important stuff, like if it falls under my supervision, I will test the branch and carefully check the implementation. And this for each PR updates. That takes a lot longer.
So I’m wondering, how do you context switch between many agent running and proposing diffs. Especially if you need to vet the changes. And how do you manage module dependencies where an update by one task can subtly influence the implementation by another?
I’m wondering this too. But from what I have seen, I think most people doing this are not really reading and vetting the output. Just faster, parallelized, vibe coding.
Not saying that’s what parent is doing, but it’s common.
For parallel work who want stuff to “happen faster”, I am convinced most of these people don’t really read (nor probably understand) the code it produces.
Honestly, I've seen too many fairly glaring mistakes in all models I've tried that signal that they can't even get the easy stuff right consistently. In the language I use most (C++), if they can't do that, how can I trust them to get all the very subtle things right? (e.g. very often they produce code that holds some form of dangling references, and when I say "hey don't do that", they go back to something very inefficient like copying things all over the place).
I am very grateful they can churn out a comprehensive test suite in gtest though and write other scripts to test / do a release and such. The relief in tedium there is welcome for sure!
I think there are opportunities to give special handling to the markdown docs and diagrams Claude likes to make a long the way to help review.
I would argue you haven't covered any.
Why not just skip the reviews then? If you can trust the models to have the necessary intelligence and context to properly review, they should be able to properly code in the first place. Obviously not where models are at today.
Here we are talking about the same model doing the review (even if you use a different model provider, it's still trained on essentially the same data, with the same objective and very similar performances).
We have had agentic systems where one agent checks the work of another since 2+ years, this isn't a paradigm pushed by AI coding model providers because it doesn't really work that well, review is still needed.
But that was two weeks ago; maybe it’s different today
Right now, background agents have two major problems:
1. There is some friction to getting the isolated environment working correctly. Difficulty depends on specifics of each project. Ranging from "select this universal container" to "it's going to be hell getting all of your dependencies working". Working in your IDE pretty much solves that - it's likely a place where everything is already setup.
2. People need to learn how agents build code. Watching an agent work in your IDE while being able to interject/correct them is extremely helpful to long term success with background agents.
My preferred way to vibe code is to lock in on a single goal and iterate towards it. When I'm waiting for stuff to finish, I'm exploring docs or info to figure out how to get closer. Reviewing the existing codebase or changes is also super useful for me to grasp where I'm up to and what to do next. This idea of managing swarms of agents for different tasks does not gel with me, too much context switching and multitasking.
Side note: You should look into electron-trpc. it greatly simplifies IPC handling
Regarding your webpage - I wish you would vibe away the annoying header coming down every time I scroll just tiny little bit up.
Sounds like you're limiting yourself to users who are comfortable paying 100-200$ monthly subscription or even thousands per month for API prices.
C.C. is expensive but i was hoping we weren't going to build tooling that exacerbated this issue simply because for some of us money is less of an issue than for most of us.
Having a nice way to manage the work trees sounds great, but the rate limiting still sounds like an issue to this approach.
https://docs.anthropic.com/en/docs/claude-code/common-workfl...
One must also always be aware that an LLM WILL ALWAYS DO what you ask it for. Often you ask for the wrong thing. And you need to rethink.
Maybe I am inefficient though I really only use at the most two additional work trees at the same time.
What? That's not my experience at all. Especially not "always"
I cannot count how many times that or something like that has happened to me.
Don't take me wrong, I'm a big fan and constant user of all these things, but I would say it frequently have problem following prompts.
Personally, I'm running 2 accounts and switching between them for maximum productivity. Just as a function of what my time is worth it is a no brainer.
https://github.com/stravu/crystal/actions/runs/15791009893/a...
I built a UI to manage this, and it is starting to turn into a new type of IDE, based around agent management and review rather than working on one thing at a time.
https://github.com/stravu/crystal