Example: read this log file and extract XYZ from it and show me a table of the results. Instead of having the agent read in the whole log file into the context and try to process it with raw LLM attention, you can get it to read in a sample and then write a script to process the whole thing. This works particularly well when you want to do something with math, like compute a mean or a median. LLMs are bad at doing math on their own, and good at writing scripts to do math for them.
A lot of interesting techniques become possible when you have an agent that can write quick scripts or CLI tools for you, on the fly, and run them as well.
When you tell an LLM to check the code for errors, the LLM could simply "realize" that the problem is complex enough to warrant building [or finding+configuring] an appropriate tool to solve the problem, and so start doing that... but instead, even for the hardest problems, the LLM will try to brute-force a solution just by "staring at the code really hard."
(To quote a certain cartoon squirrel, "that trick never works!" And to paraphrase the LLM's predictable response, "this time for sure!")
That is for tasks where a programmatic script solution is a good idea though. I don't think your example of "check the code for errors" really falls in that category - how would you write a script to do that? "Staring at the code really hard" to catch errors that could never have been caught with any static analysis tool is actually where an LLM really shines! Unless by "check for errors" you just meant "run a static analysis tool", in which case sure, it should run the linter or typechecker or whatever.
After all, solving an immediate problem that seems like it could come up again, by “taking the opportunity” to solve the problem from now on by introducing workflow automation to solve the problem, is what an experienced human engineer would likely do in such a situation (if they aren’t pressed for time.)
Hmm. My experience of "the average programmer" doesn't look like yours and looks more like the LLM :/
I'm constantly flabbergasted as to how way too many devs fumble through digging into logs or extracting information or what have you because it simply doesn't occur to them that tools can be composed together.
From my experience, only a few rare devs do this. Most will stick with (broken/wrong) GUI tools they know made by others, by convenience.
I used claude to translate my application and I asked him to translate each text in the application to his best abilities.
That worked great for one view, but when I asked him to translate the rest of the application in the same fashion he got lazy and started to write a script to substitute some words instead of actually translating sentences.
Let me illustrate with a specific, simple example: fixing linter or compiler errors. The problems I solve with this method are all verifiable via the command line (this can usually be documented in CLAUDE.md). Claude Code will continuously adjust the code based on the linter's output until all errors are resolved. This process often takes quite some time. I typically do this after completing a feature development. If Claude Code mistakenly thinks it has finished the task during one of these checks, it will halt the entire process. I then have to restart it using the same prompt to continue the task.
Therefore, I'm looking for an external tool to manage Claude Code. I haven't found one yet. I've seen some articles suggesting the use of a subagents approach, where tools like Gemini CLI or Codex could launch Claude Code. I haven't thoroughly explored this method yet.
I have a `task build` command that runs linters, tests and builds the project. All the commands have verbosity tuned down to minimum to not waste context on useless crap.
Claude remembers to do it pretty well. I have it in my global CLAUDE.md sot I guess it has more weight? Dunno.
Doesn’t matter if you tell it multiple times in CLAUDE.md to not skip checks, it will eventually just skip them so it can commit. It’s infuriating.
I hope that as CC evolves there is a better way to tell/force the model to do things like that (linters, formatters, unit/e2e tests, etc).
Students don't get to choose whether to take the test, so why do we give AI the choice?
I had tried coding with ChatGPT a year or so ago and the effort needed to get anything useful out of it greatly exceeded any benifit, so I went into CC with low expectations, but have been blown away.