Reading that post sent me down the path to this one. This stack order makes total sense, although in practice it's possible 1-2 merge into a single product with two distinct steps.
The 3. is interesting too - my suspicion is that ~70% of PRs are too minor to need human review as the models get better, but the top 30% will because there will be opinion on what is and isn't the right way to do that complex change.
Before running LLM-generated code through yet more LLMs, you can run it through traditional static analysis (linters, SAST, auto-formatters). They aren’t flashy but they produce the same results 100% of the time.
Consistency is critical if you want to pass/fail a build on the results. Nobody wants a flaky code reviewer robot, just like flaky tests are the worst.
I imagine code review will evolve into a three tier pyramid:
1. Static analysis (instant, consistent) — e.g using Qlty CLI (https://github.com/qltysh/qlty) as a Claude Code or Git hook
2. LLMs — Has the advantage of being able to catch semantic issues
3. Human
We make sure commits pass each level in succession before moving on to the next.