Managing context goes a long way, too. I clear context for every new task and keep the local context files up to date with key info to get the LLM on target quickly
Aggressively recreating your context is still the best way to get the best results from these tools too, so it has a secondary benefit.
The other skill is in knowing exactly when to roll up your sleeves and do it the old fashioned way. Which things they're good/useful for, and which things they aren't.
My take is that the overlap is strongest with engineering management. If you can learn how to manage a team of human engineers well, that translates to managing a team of agents well.
None of that knowlege will get useless, only working around current limitations of agents will.
If I want to start a new task, I /clear and then tell it to re-read the CLAUDE.md document where I put all of the quick context: Description of the project, key goals, where to find key code, reminders for tools to use, and so on. I aggressively update this file as I notice things that it’s always forgetting or looking up. I know some people have the LLM update their context file but I just do it myself with seemingly better results.
Using /compact burns through a lot of your usage quota and retains a lot of things you may not need. Giving it directions like “starting a new task doing ____, only keep necessary context for that” can help, but hitting /clear and having it re-read a short context primer is faster and uses less quota.
I do wish that ChatGPT had a toggle next to each project file instead of having to delete and reupload to toggle or create separate projects for various combinations of files.
So if you look at the total cost of running the benchmark, it's surprisingly similar to other models -- the higher price per token is offset by the significantly fewer tokens required to complete a task.
See "Cost to Run Artificial Analysis Index" and "Intelligence vs Output Tokens" here
https://artificialanalysis.ai/
...With the obligatory caveat that benchmarks are largely irrelevant for actual real world tasks and you need to test the thing on your actual task to see how well it does!
I consistently run into limits with CC (Opus 4.5) -- but even though Codex seems to be spending significantly more tokens, it just seems like the quota limit is much higher?