That and the thing "I" build with AI is not the same as the thing I would have built myself. So you're comparing some lowest common denominator version of the software with an original work created by a human. Not once have I got an LLM to output code where I think that's what I had in my mind's eye when I wrote the prompt.
Curious to hear more about this. I can't help feeling such an attempt is fundamentally flawed just as software estimates are. Because you're never building the same thing twice.