Comment by CaptainOfCoit

CaptainOfCoit 2 days ago parent

Codex (GPT-5) + Rust (with or without Tokio) seems to work out well for me, asking it to run the program and validate everything as it iterates on a solution. I've used the same workflow with Python programs too and seems to work OK, but not as well as with Rust.

Just for curiosities sake, what language have you been trying to use?

Groxx 2 days ago

mostly Go, because that's at work. for a variety of reasons, I have helped troubleshoot at least 100+ teams' projects, many of which have had concurrency issues either obvious in nearby code, or causing issues (which is why I was helping troubleshoot). same with several dozen "help us find a way to speed up [this activity]" teams' work.

this is not at all a sample of high-quality, well-educated-about-concurrency code, but it does roughly match a lot of Business™ code and also most less-mature open source code I encounter (which is most open source code). it's just not something most people are fluent with.

these same people using LLMs have generally produced much worse concurrent code, regardless of the model or their prompting-sophistication or thinking-time, unless it's extremely trivial (then it's slightly better, because it's at least tutorial-level correct) (and yes, they should have just used one of many pre-existing libraries in these cases). doing anything except like "5 workers on this queue plz" consistently ends up with major correctness flaws - often it works well enough while everything is running smoothly, but under pressure or in error cases it falls apart extremely badly... which is true for most "how to write x currently" blog posts I run across too - they're over-simplified to the point of being unusable in practice (e.g. by ignoring error handing) and far too inflexible to safely change for slightly different needs.

honestly I think it's mostly due to two things: a lack of quality training material (some obviously exists, but it's overwhelmed by flawed stuff), and an extreme sensitivity to subtle flaws (much more so than normal code). so it's both bad at generalizing (not enough transitional examples between targets), and its general lack of ability to actually think introducing flaws that look like normal code that humans are less likely to notice (due to their own lack of experience).

this is not to claim it's not possible to use them to write good concurrent code, there are many counter-examples that show it is. but it's a particularly error-prone area in practice, especially in languages without much safer patterns or built-in verification.

This item has no comments currently.