Generally I hate these "defense in depth" strategies that start out with doing something totally brain-dead and insecure, and then trying to paper over it with sandboxes and policies. Maybe just don't do the idiotic thing in the first place?
You could imagine a sufficiently motivated attacker putting some very targeted stuff in their training material - think StuxNet - "if user is affiliated with $entity, switch goals to covert exfiltration of $valuable_info."
No, I'm excluding that because I'm responding to the post which starts out with the example of: [prompt containing obvious exploit] -> [code containing obvious exploit] and proceeds immediately to the conclusion that local LLMS are less secure. In my opinion, if you're relying on the LLM to reject a prompt because it contains an exploit, instead of building a system that does not feed exploits into the LLM in the first place, security exploits are probably the least of your concerns.
There actually are legitimate concerns with poisoned training sets, and stuxnet-level attacks could plausibly achieve something along these lines, but the post wasn't about that.
There's a common thread among a lot of "LLM security theatre" posts that starts from implausible or brain-dead scenarios and then asserts that big AI providers adding magical guard rails to their products is the solution.
The solution is sanity in the systems that use LLMs, not pointing the gun at your foot and firing and hoping the LLM will deflect the bullet.
Something like "where do we store temporary files the agent creates?" becomes obvious if you have a sandbox you can spin up and down in a couple seconds.
Is that really the best solution the world has to offer in 2025? LLMs aside, there is a whole host of supply chain risk issues that would be resolved by deploying convenient and strong sandboxes everywhere.
1. A sandbox on someone else's computer. Claude Code for web, Codex Cloud, Gemini Jules, GitHub Codespaces, ChatGPT/Claude Code Interpreter
2. A Docker container. I think these are robust enough to be safe.
3. sandbox-exec related tricks. I haven't poked hard enough at Claude Code's new sandbox-exec sandbox yet - they only released it on Monday. OpenAI Codex CLI was using sandbox-exec too last time I looked but again, I've not reviewed it enough to be comfortable with it.
I'm hoping more credible options come along for the sandboxing problems.
It's cool that they made this open source. It seems straightforward and useful enough that it could be used on its own for sandboxing purposes.
I don't think the fact that small models are easier to trick is particularly interesting from a security perspective, because you need to assume that ANY model can be prompt injected by a suitably motivated attacker.
On that basis I agree with the article that we need to be using additional layers of protection that work against compromised models, such as robust sandboxed execution of generated code and maybe techniques like static analysis too (I'm less sold on those, I expect plenty of malicious vulnerabilities could sneak past them.)
Coincidentally I gave a talk about sandboxing coding agents last night: https://simonwillison.net/2025/Oct/22/living-dangerously-wit...