One can see the results in a place where most code is terrible (data science is the place I see this most, as it's what I do mostly) but most people don't realise this. I assume this also happens for stuff like frontend, where I don't see the badness because I'm not an expert.
The tricky part is that I don't think all programming is formal logic at all, just a small part. And this thing with that different code is for different purposes really screws up LLMs reasoning process unless you make it really clear what code is for what.
Why do you say this? The foundation of all of computer science is formal logic and symbolic logic.
There is a reason a lot of programmers see programming having lots of similarities with painting and other creative activities.
The space is just so large that everyone has their own "basis" that sometimes even move with time. They can still be good programmers imo.
Yes, but also it has to deal with "the real world" which is only logical if you can encode a near infinite number of variables, instead we create leaky abstractions in order to actually get work done.
my own solution? 1.56 seconds. I consider myself to be at an intermediate skill level, and while LLMs are useful, they likely wont replace any but the least talented programmers. Even then i'd value human with critial thinking paired with an LLM over an even more competent LLM.
Just for curiosities sake, what language have you been trying to use?
this is not at all a sample of high-quality, well-educated-about-concurrency code, but it does roughly match a lot of Business™ code and also most less-mature open source code I encounter (which is most open source code). it's just not something most people are fluent with.
these same people using LLMs have generally produced much worse concurrent code, regardless of the model or their prompting-sophistication or thinking-time, unless it's extremely trivial (then it's slightly better, because it's at least tutorial-level correct) (and yes, they should have just used one of many pre-existing libraries in these cases). doing anything except like "5 workers on this queue plz" consistently ends up with major correctness flaws - often it works well enough while everything is running smoothly, but under pressure or in error cases it falls apart extremely badly... which is true for most "how to write x currently" blog posts I run across too - they're over-simplified to the point of being unusable in practice (e.g. by ignoring error handing) and far too inflexible to safely change for slightly different needs.
honestly I think it's mostly due to two things: a lack of quality training material (some obviously exists, but it's overwhelmed by flawed stuff), and an extreme sensitivity to subtle flaws (much more so than normal code). so it's both bad at generalizing (not enough transitional examples between targets), and its general lack of ability to actually think introducing flaws that look like normal code that humans are less likely to notice (due to their own lack of experience).
this is not to claim it's not possible to use them to write good concurrent code, there are many counter-examples that show it is. but it's a particularly error-prone area in practice, especially in languages without much safer patterns or built-in verification.