The same could apply to LLMs, or even different runs from the same LLMs.
No but programs are. An LLM can be a programmer too, but it’s not a program the way we want and expect programs to behave: deterministically. Even if a programmer could perform a TLS handshake manually very fast, ignoring the immense waste of energy, the program is a much better engineering component, simply because it is deterministic and does the same thing every time. If there’s a bug, it can be fixed, and then the bug will not re-appear.
> If I ask ten programmers to come up with a solution to the same problem, I'm not likely to get ten identical copies.
Right, but you only want one copy. If you need different clients speaking with each other you need to define a protocol and run conformance tests, which is a lot of work. It’s certainly doable, but you don’t want a different program every time you run it.
I really didn’t expect arguing for reproducibility in engineering to be controversial. The primary way we fix bugs is by literally asking for steps to reproduction. This is not possible when you have a chaos agent in the middle, no matter how good. The only reasonable conclusion is to treat AI systems as entirely different components and isolate them such that you can keep the boring predictability of mechanistic programs. Basically separating engineering from the alchemy.
Yes, programming isn’t always deterministic, not just due to the leftpad API endpoint being down, but by design - you can’t deterministically tell which button the user is going to click. So far so good.
But, you program for the things that you expect to happen, and handle the rest as errors. If you look at the branching topology of well-written code, the majority of paths lead to an error. Most strings are not valid json, but are handled perfectly well as errors. The paths you didn’t predict can cause bugs, and those bugs can be fixed.
Within this system, you have effective local determinism. In practice, this gives you the following guarantee: if the program executed correctly until point X, the local state is known. This state is used to build on top of that, and continue the chain of bounded determinism, which is so incredibly reliable on modern CPUs that you can run massive financial transactions and be sure it works. Or, run a weapons system or a flight control system.
So when people point out that LLMs are non-deterministic (or technically unstable, to avoid bike-shedding), they mean that it’s a fundamentally different type of component in an engineering system. It’s not like retrying an HTTP request, because when things go wrong it doesn’t produce “errors”, it produces garbage that looks like gold.