Comment by pelagicAustral

pelagicAustral Jun 19, 2025 parent

I think that if you give the same task to three different developers you'll get three different implementations. It's not a random result if you do get the functionality that was expected, and at that, I do think the prompt plays an important role in offering a view of how the result was achieved.

klabb3 Jun 19, 2025

> I think that if you give the same task to three different developers you'll get three different implementations.

Yes, but if you want them to be compatible you need to define a protocol and conformance test suite. This is way more work than writing a single implementation.

The code is the real spec. Every piece of unintentional non-determinism can be a hazard. That’s why you want the code to be the unit of maintenance, not a prompt.

imiric Jun 19, 2025

I know! Let's encode the spec into a format that doesn't have the ambiguities of natural language.

klabb3 Jun 19, 2025

Right. Great idea. Maybe call it ”formal execution spec for LLM reference” or something. It could even be versioned in some kind of distributed merkle tree.

blobbers 4 days ago

Interestingly, I was generating some scraping code today. I prompted it fairly generically and it decided to spit out some Selenium code. I reported the stack trace failure, and it gave me a new script with playwright. That failed also and it gave me some suggestions to fix it. I asked it to update the whole script rather than snippets, and it responded with "Hey let's not use either of these and here we'll use the site's API." and proceeded to do that.

Kind of crazy, it basically found 3 different hammers to hit the nail I wanted. The API unfortunately seems to be timeing out (I had to add the timeout=10 to the post u_u)

This item has no comments currently.