Preferences

> English is a terrible language for deterministic outcomes in complex/complicated systems

I think that you seem to be under the impression that Karpathy somehow alluded to or hinted at that in his talk, which indicates you haven't actually watched the talk, which makes your first point kind of weird.

I feel like one of the stronger points he made, was that you cannot treat the LLMs as something they're explicitly not, so why would anyone expect deterministic outcomes from them?

He's making the case for coding with LLMs, not letting the LLMs go by themselves writing code ("vibe coding"), and understanding how they work before attempting to do so.


tudorizer
I watched the entire talk, quite carefully. He explicitly states how excited he was about his tweet mentioning English.

The disclaimer you mention was indeed mentioned, although it's "in one ear, out the other" with most of his audience.

If I give you a glazed donut with a brief asterisk about how sugar can cause diabetes will it stop you from eating the donut?

You also expect deterministic outcomes when making analogies with power plants and fabs.

diggan OP
I think this is the moment you're referring to? https://youtu.be/LCEmiRjPEtQ?si=QWkimLapX6oIqAjI&t=236

> maybe you've seen a lot of GitHub code is not just like code anymore there's a bunch of like English interspersed with code and so I think kind of there's a growing category of new kind of code so not only is it a new programming paradigm it's also remarkable to me that it's in our native language of English and so when this blew my mind a few uh I guess years ago now I tweeted this and um I think it captured the attention of a lot of people and this is my currently pinned tweet uh is that remarkably we're now programming computers in English now

I agree that it's remarkable that you can tell a computer "What is the biggest city in Maresme?" and it tries to answer that question. I don't think he's saying "English is the best language to make complicated systems uncomplicated with", or anything to that effect. Just like I still think "Wow, this thing is fucking flying" every time I sit onboard a airplane, LLMs are kind of incredible in some ways, yet so "dumb" in some other ways. It sounds to me like he's sharing a similar sentiment but about LLMs.

> although it's "in one ear, out the other" with most of his audience.

Did you talk with them? Otherwise this is just creating an imaginary argument against some people you just assume they didn't listen.

> If I give you a glazed donut with a brief asterisk about how sugar can cause diabetes will it stop you from eating the donut?

If I wanted to eat a donut at that point, I guess I'd eat it anyways? But my aversion to risk (or rather the lack of it) tend to be non-typical.

What does my answer mean in the context of LLMs and non-determinism?

> You also expect deterministic outcomes when making analogies with power plants and fabs.

Are you saying that the analogy should be deterministic or that power plants and fabs are deterministic? Because I don't understand if the former, and the latter really isn't deterministic by any definition I recognize that word by.

tudorizer
> That's a lot of people to talk to in a day more or less, since the talk happened. Were they all there and you too, or you all had a watch party or something?

hehe, I wish.

The topics in the talk are not new. They have been explored and pondered up for quite a while now.

As for the outcome of the donut experiment, I don't know. You tell me. Apply it repeatedly at a big scale and see if you should alter the initial offer for best outcomes (as relative as "best" might be).

diggan OP
> The topics in the talk are not new.

Sure, but your initial dismissal ("95% X, 5% Y") is literally about this talk no? And when you say 'it's "in one ear, out the other" with most of his audience' that's based on some previous experience, rather than the talk itself? I guess I got confused what applied to what event.

> As for the outcome of the donut experiment, I don't know. You tell me. Apply it repeatedly at a big scale and see if you should alter the initial offer for best outcomes (as relative as "best" might be).

Maybe I'm extra slow today, how does this tie into our conversation so far? Does it have anything to do with determinism or what was the idea behind bringing it up? I'm afraid you're gonna have to spell it out for me, sorry about that :)

tudorizer
> Did you talk with them? Otherwise this is just creating an imaginary argument against some people you just assume they didn't listen.

I have, unfortunately. Start-up founders, managers, investors who taunt the need for engineers because "AI can fix it".

Don't get me wrong, there are plenty of "stochastic parrot" engineers even without AI, but still, not enough to make blanket statements.

diggan OP
That's a lot of people to talk to in a day more or less, since the talk happened. Were they all there and you too, or you all had a watch party or something?

Still, what's the outcome of our "glazed donut" argument, you got me curious what that would lead to. Did I die of diabetes?

jbeninger
I think the analogy is that vibe coding is bad for you but feels good. Like a donut.

But I'd say the real situation is more akin to "if you eat this donut quickly, you might get diabetes, but if you eat it slowly, it's fine", which is a bad analogy, but a bit more accurate.

Your experience with fabs must be somewhat limited if you think that the state of the art in fabs produces deterministic results. Please lookup (or ask friends) for the typical yields and error mitigation features of modern chips and try to visualize if you think it is possible to have determinism when the density of circuits starts to approach levels that cannot be imspected with regular optical microscopes anymore. Modern chip fabrication is closer to LLM code in even more ways than what is presented in the video.
tudorizer
Fair. No process is 100% efficient and the depths of many topics become ambiguous to the point where margins of error need to be introduced.

Chip fabs are defo far into said depths.

Must we apply this at more shallow levels too?

whilenot-dev
> Modern chip fabrication is closer to LLM code

As is, I don't quite understand what you're getting at here. Please just think that through and tell us what happens to the yield ratio when the software running on all those photolithography machines wouldn't be deterministic.

kadushka
An output of a fab, just like an output of an LLM, is non-deterministic, but is good enough, or is being optimized to be good enough.

Non-determinism is not the problem, it's the quality of the software that matters. You can repeatedly ask me to solve a particular leetcode puzzle, and every time I might output a slightly different version. That's fine as long as the code solves the problem.

The software running on the machines (or anywhere) just needs to be better (choose your metric here) than the software written by humans. Software written by GPT-4 is better than software written by GPT-3.5, and the software written by o3 is better than software written by GPT-4. That's just the improvement from the last 3 years, and there's a massive, trillion-dollar effort worldwide to continue the progress.

whilenot-dev
Hardware always involves some level of non-determinism, because the physical world is messier than the virtual software world. Every hardware engineer accepts that and learns how to design solutions despite those constraints. But you're right, non-determinism is not the current problem in some fabs, because the whole process has been modeled with it in mind, and it's the yield ratio that needs to be deterministic enough to offer a service. Remember the struggles in Intels fabs? Revenue reflects that at fabs.

The software quality at companies like ASML seems to be in a bad shape already, and I remember ex-employees stating that there are some team leads higher up who can at least reason about existing software procedures, their implementation, side effects and their outcomes. Do you think this software is as thoroughly documented as some open source project? The purchase costs for those machines are in the mid-3-digit million range (operating costs excluded) and are expected to run 24/7 to be somewhat worthwhile. Operators can handle hardware issues on the spot and work around them, but what do you think happens with downtime due to non-deterministic software issues?

The output of the verilog optimizer is different every time. The output of a fab is different in every batch. Each chip in a batch is different from others in that batch. Quality control drops the fraction of truly poor chips, and hardware design features might downgrade some of the partially failed chips to be classified as lesser versions of the same initial design. The final chips work as intended, mostly, but perhaps the error tolerance to overclocking or the mean time between failures is slightly different between chips. We can all work with them just fine almost all the time. The same principles apply to complex LLM-orchestrated code projects. I dont mind if my compiler gives different code each time because it uses a stochastic optimizer, but I want my code to do what I want and to not fail more than a certain tolerance I have for this code, which depends on the application. By giving more insight into the layers of testing to more people, and by encouraging the new documentation practices that Andrej mentioned, LLM coding will change the practice of software engineering rather dramatically. Code 2.0 was flexible and could yield results that were better than human coded efforts for complex problems, but the architecture, code, data, were selected by humans. In code 3.0 humans have access to (non-deterministic) building blocks that are written in natural language, to bug fixes and feature addition that happen in a conversation style. Similar engineering principles as with code 1.0 still apply (even more so than with code2.0, unless the product is a neural net), but the emphasis on verification increased dramatically as a fraction of the total effort, even though the total effort has gone down a lot. I can’t wait to see increased help in code verification efforts from this batch of people in the AI startup school as a result of Andrej’s presentation.
fifilura
Either way, I am not sure it is a requirement on HN to read/view the source.

Particularly not a 40min video.

Maybe it is tongue-in-cheek, maybe I am serious. I am not sure myself. But sometimes the interesting discussions comes from what is on top of the posters mind when viewing the title. Is that bad?

diggan OP
> Is that bad?

It doesn't have to be. But it does get somewhat boring and trite after a while when you start noticing that certain subjects on HN tend to attract general and/or samey comments about $thing, rather than the submission topic within $thing, and I do think that is against the guidelines.

> Please don't post shallow dismissals [...] Avoid generic tangents. Omit internet tropes. [...]

The specific part of:

> English is a terrible language for deterministic outcomes

Strikes me as both as a generic tangent about LLMs, and the comment as a whole feels like a shallow dismissal of the entire talk, as Karpathy never claims English is a good language for deterministic outcomes, nor have I heard anyone else make that claim.

tudorizer
Might sound like a generic tangent, but it's the conclusion people will leave from the talk.
diggan OP
But is it curious? Is it thoughtful and substantive? Maybe it could have been thoughtful, if it felt like it was in response to what was mentioned in the submission.
karaterobot
It's odd! The guidelines don't say anything about having to read or watch what the posts linked to, all they say is it's inappropriate to accuse someone you're responding to of not having done so.

There is a community expectation that people will know what they're talking about before posting, and in most cases that means having read the article. At the same time, I suspect that in many cases a lot of people commenting have not actually read the thing they're nominally commenting on, and they get away with it because the people upvoting them haven't either.

However, I think it's a good idea to do so, at least to make a top-level comment on an article. If you're just responding to someone else's comment, I don't think it's as necessary. But to stand up and make a statement about something you know nothing about seems buffoonish and would not, in general, elevate the level of discussion.

tudorizer
I accept any equivalents of reading comprehension tests to prove thay I watched the video, as I have many of Andrej's in the past. He's generally a good communicator, defo easy to follow.

This item has no comments currently.