SchemaLoad parent
Building something from scratch where there are plenty of examples public on github seems to be the easiest case. Put these agents on a real existing codebase and ask them to fix a bug and they become useless.
I think this would vary a lot between "real" code basis. I have had a lot of success when using somewhat stricter frameworks, with typed interfaces, and requiring well defined unit tests, and modules which ecapsulate a lot of logic.
Basically like Java Spring Boot or NestJS type projects.