Guess you have not tried more advanced prompting techniques like CoT, Agents and RAG.
Yeah, if an LLM was truly capable of reasoning, then whenever it makes a mistake, e.g. due to randomness or due to lack of knowledge, then pointing out the mistakes and giving steps on correcting the mistakes should result in basically a 100% success rate, since the assistant has infinite capacity to accommodate the LLM's weaknesses.
When you look at things like https://arxiv.org/abs/2408.06195 you notice that the amount of tokens needed to solve trivial tasks is somewhat ridiculous. On the order of 300k tokens for a simple grade school problem. That is roughly three hours at a rate of 30 token/s. You could fill 400 pages of a book with that many tokens.
---
LLM: The answer is A.
Me: That's wrong. Try again.
LLM: Oh I'm sorry, you're completely right. The answer is B.
Me: That's wrong. Try again.
LLM: Oh I'm sorry, you're completely right. The answer is A.
Me: Time to short NVDA.
LLM: As an AI language learning model without real-time market data or the ability to predict future stock movements, I can't advise on whether it's an appropriate time to short NVIDIA or any other stock.
---