On a more serious note: I think the high-level structuring of the architecture, and then the breakdown into tactical solutions — weaving the whole program together — is a fundamental limitation. It's akin to theorem-proving, which is just hard. Maybe it's just a scale issue; I'm bullish on AGI, so that's my preferred opinion.
Try this prompt:"Please rate this business plan on a scale of 1-100 and provide buttle points on how it can be improved without rewriting any of it: <business plan>"
Edit: On second thought, maybe at a certain minimum context window size it is possible to cajole the instructions in such a way that you at any point in the process make the LLM work at a suitable level of abstraction more like humans do.
For human-like learning it would need to update it state (learn) on the fly as it does inference.
Is it token limitations or accuracy the further you get into the solution?