Comment by Imnimo - Hacker Neue

Imnimo Oct 16, 2025 parent

I feel like a danger with this sort of thing is that the capability of the system to use the right skill is limited by the little blurb you give about what the skill is for. Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job. But Claude is always starting from ground zero and skimming your descriptions.

mbesto Oct 16, 2025

> Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job.

Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).

More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.

[0] https://www.youtube.com/watch?v=21EYKqUsPfg

mediaman Oct 16, 2025

It's a false dichotomy. LLMs are already being trained with RL to have goal directedness.

He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.

anomaloustho Oct 16, 2025

I wrote elsewhere but I’m more interpreting this distinction as “RL in real-time” vs “RL beforehand”.

stevenpetryk Oct 17, 2025

This is referred to as “online reinforcement learning” and is already something done by, for example Cursor for their tab prediction model.

https://cursor.com/blog/tab-rl

tinodb Oct 19, 2025

Not sure that’s the same. They just very frequently retrain and “deploy a new model”.

munchler Oct 16, 2025

I agree with this description, but I'm not sure we really want our AI agents evolving in real time as they gain experience. Having a static model that is thoroughly tested before deployment seems much safer.

mbesto Oct 16, 2025

> Having a static model that is thoroughly tested before deployment seems much safer.

While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.

CryptoBanker Oct 17, 2025

> While that might true, it fundamentally means it's not going to ever replicate human or provide super intelligence.

Many people would argue that's a good thing

OtherShrezzing Oct 16, 2025

In the interview transcript, he seems aware that the field is doing RL, and he makes a compelling argument that bootstrapping isn’t as scalable as a purely RL trained AI would be.

isodev Oct 17, 2025

Let’s not overstate what the technology actually is. LLMs amount to random token generators that try their best to have their outputs “rhyme” with their prompts, instructions, skills, or what humans know as goals and consequences.

adastra22 Oct 17, 2025

It does a lot more than that.

isodev Oct 17, 2025

It’s literally a slot machine for random text. With “services around it” to give the randomness some shape and tools.

adastra22 Oct 17, 2025

It is literally not. 2/3 of the weights are in the multi-layer perceptron which is a dynamic information encoding and retrieval machine. And the attention mechanisms allow for very complex data interrelationships.

At the very end of an extremely long and sophisticated process, the final mapping is softmax transformed and the distribution sampled. That is one operation among hundreds of billions leading up to it.

It’s like saying is a jeopardy player is random word generating machine — they see a question and they generate “what is “ followed by a random word—random because there is some uncertainty in their mind even in the final moment. That is both technically true, but incomplete, and entirely missing the point.

mbesto Oct 16, 2025

> LLMs are already being trained with RL to have goal directedness.

That might be true, but we're talking about the fundamentals of the concept. His argument is that you're never going to reach AGI/super intelligence on an evolution of the current concepts (mimicry) even through fine tuning and adaptions - it'll like be different (and likely based on some RL technique). At least we have NO history to suggest this will be case (hence his argument for "the bitter lesson").

samrus Oct 16, 2025

The LLMs dont have RL baked into them. They need that at the token prediction level to be able to do the sort of things humans can do

dingnuts Oct 16, 2025

Explain something to me that I've long wondered: how does Reinforcement Learning work if you cannot measure your distance from the goal? In other words, how can RL be used for literally anything qualitative?

kmacdough Oct 16, 2025

This is one of known hardest parts of RL. The short answer is human feedback.

But this is easier said than done. Current models require vastly more learning events than humans, making direct supervision infeasable. One strategy is to train models on human supervisors, so they can bear the bulk of the supervision. This is tricky, but has proven more effective than direct supervision.

But, in my experience, AIs don't specifically struggle with the "qualitative" side of things per-se. In fact, they're great at things like word choice, color theory, etc. Rather, they struggle to understand continuity, consequence and to combine disparate sources of input. They also suck at differentiating fact from fabrication. To speculate wildly, it feels like it's missing the the RL of living in the "real world". In order to eat, sleep and breath, you must operate within the bounds of physics and society and live forever with the consequences of an ever-growing history of choices.

ewoodrich Oct 16, 2025

Whenever I watch Claude Code or Codex get stuck trying to force a square peg into a round hole and failing over and over it makes me wish that they could feel the creeping sense of uncertainty and dread a human would in that situation after failure after failure.

Which eventually forces you to take a step back and start questioning basic assumptions until (hopefully) you get a spark of realization of the flaws in your original plan, and then recalibrate based on that new understanding and tackle it totally differently.

But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.

Because no big deal, if it’s wrong it’s the human's problem to untangle and Anthropic gets paid either way so why not try?

2 More Comments →

mbesto Oct 16, 2025

This 100%.

While we might agreed that language is foundational to what it is to be human, it's myopic to think its the only thing. LLMs are based on training sets of language (period).

mediaman Oct 17, 2025

RL works great on verifiable domains like math, and to some significant extent coding.

Coding is an interesting example because as we change levels of abstraction from the syntax of a specific function to, say, the architecture of a software system, the ability to measure verifiable correctness declines. As a result, RL-tuned LLMs are better at creating syntactically correct functions but struggle as the abstraction layer increases.

In other fields, it is very difficult to verify correctness. What is good art? Here, LLMs and their ilk can still produce good output, but it becomes hard to produce "superhuman" output, because in nonverifiable domains their capability is dependent on mimicry; it is RL that gives the AI the ability to perform at superhuman levels. With RL, rather than merely fitting its parameters to a set of extant data it can follow the scent of a ground truth signal of excellence. No scent, no outperformance.

leptons Oct 16, 2025

I can't wait to try to convince an LLM/RL/whatever-it-is that what it "thinks" is right is actually wrong.

baxtr Oct 16, 2025

So it’s on-the-fly adaptive mimicry?

buildbot Oct 16, 2025

The industry has been doing RL on many kinds of neural networks, including LLMs, for quite some time. Is this person saying we RL on some kind of non neural network design? Why is that more likely to bring AGI than an LLM?.

> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.

Citation?

anomaloustho Oct 16, 2025

Looks like they added the link. But I think it’s doing RL in realtime vs pre-trained as an LLM is.

And I associate that part to AGI being able to do cutting edge research and explore new ideas like humans can. Where, when that seems to “happen” with LLMs it’s been more debatable. (e.g. there was an existing paper that the LLM was able to tap into)

I guess another example would be to get an AGI doing RL in realtime to get really good at a video game with completely different mechanics in the same way a human could. Today, that wouldn’t really happen unless it was able to pre-train on something similar.

ibejoeb Oct 16, 2025

I don't think any of the commercial models are doing RL at the consumer. The R is just accepting or rejecting the action, right?

jfarina Oct 16, 2025

Why are you asking them to cite something for that statement? Are you questioning whether it's the foundation for intelligence or whether LLMS understand goals and consequences?

buildbot Oct 16, 2025

Yes, I'm questioning if that's the foundation of intelligence. Says who?

mbesto Oct 16, 2025

Richard Sutton. He won a Turing Award. Why ask your question above when you can just watch the YouTube link I posted?

skurilyak Oct 16, 2025

Besides a "reference manual", Claude Skills is analogous to a "toolkit with an instruction manual" in that it includes both instructions (manuals) and executable functions (tools/code)

hbarka Oct 16, 2025

For humans, it’s not uncommon to have a clever realization by way of serendipity. How do you skill AI to have serendipity.

Weeenion Oct 17, 2025

I would love to understand were this notion of LLM becoming AGI ever came from?

ChatGPT broke upen the dam to massive budget on AI/LM and LLM will probably be a puzzle peace to AGI. But otherwise?

I mean it should be clear that we have so much work to do like RL (which now happens btw. on massive scale because you thumb up or down every day), thinking, Model of Experts, toolcalling and super super critical: Architecture.

Compute is a hard upper limit too.

And the math isn't done either. The performance of Context length has advanced, we also saw other approcheas like a diffusion based models.

Whenever you hear the leading experts talking, they mention world models.

We are still in a phase were we have plenty of very obivous ideas people need to try out.

But alone the quality of whispher, llm as an interface and tool calling can solve problems with robotics and stuff, no one was able to solve that easy ever before.

vonneumannstan Oct 16, 2025

This is an uninformed take. Much of the improvement in performance of LLM based models has been through RLHF and other RL techniques.

mbesto Oct 16, 2025

> This is an uninformed take.

You may disagree with this take but its not uninformed. Many LLMs use self‑supervised pretraining followed by RL‑based fine‑tuning but that's essentially it - it's fine tuning.

vonneumannstan Oct 17, 2025

I think you're seriously underestimating the importance of the RL steps on LLM performance.

Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.

zobzu Oct 16, 2025

IMO this is a context window issue. Humans are pretty good are memorizing super broad context without great accuracy. Sometimes our "recall" function doesn't even work right ("How do you say 'blah' in German again?"), so the more you specialize (say, 10k hours / mastery), the better you are at recalling a specific set of "skills", but perhaps not other skills.

On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.

Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.

When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.

ryancnelson Oct 16, 2025

I commented here already about deli-gator ( https://github.com/ryancnelson/deli-gator ) , but your summary nailed what I didn’t mention here before: Context.

I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.

dingnuts Oct 16, 2025

>I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months

what the fuck, there is absolutely no way this was cheaper or more productive than just learning to use curl and writing curl calls yourself. Curl isn't even hard! And if you learn to use it, you get WAY better at working with HTTP!

You're kneecapping yourself to expend more effort than it would take to just write the calls, helping to train a bot to do the job you should be doing

jmtulloss Oct 16, 2025

My interpretation of the parent comment was that they were loading specific curl calls into context so that Claude could properly exercise the endpoints after making changes.

F7F7F7 Oct 16, 2025

He’s likely talking about Claude’s hook system that Anthropic created to provide better control over context.

ryancnelson Oct 16, 2025

i know how to use curl. (I was a contributor before git existed) … watching Claude iterate to re-learn whether to try application/x-form-urle ncoded or GET /?foo wastes SO MUCH time and fills your context with “how to curl” that you re-send over again until your context compacts.

You are bad at reading comprehension. My comment meant I can tell Claude “update jira with that test outcome in a comment” and, Claude can eventually figure that out with just a Key and curl, but that’s way too low level.

What I linked to literally explains that, with code and a blog post.

mbesto Oct 16, 2025

> IMO this is a context window issue.

Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.

dwaltrip Oct 16, 2025

There are ways to compensate for lack of “continual learning”, but recognizing that underlying missing piece is important.

adastra22 Oct 17, 2025

Worth noting, even though it isn’t critical to your argument, that LLMs do not have perfect recall. I got to great lengths to keep agentic tools from relying on memory, because they often get it subtly wrong.

andruby Oct 16, 2025

Would this requirement to start from ground zero in current LLMs be an artefact of the requirement to have a "multi-tenant" infrastructure?

Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.

Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?

ChadMoran Oct 16, 2025

This is the crux of knowledge/tool enrichment in LLMs. The idea that we can have knowledge bases and LLMs will know WHEN to use them is a bit of a pipe dream right now.

fragmede Oct 16, 2025

Can you be more specific? The simple case seems to be solved, eg if I have an mcp for foo enabled and then ask about a list of foo, Claude will go and call the list function on foo.

corytheboyd Oct 16, 2025

> […] and then ask about a list of foo

Not OP, but this is the part that I take issue with. I want to forget what tools are there and have the LLM figure out on its own which tool to use. Having to remember to add special words to encourage it to use specific tools (required a lot of the time, especially with esoteric tools) is annoying. I’m not saying this renders the whole thing “useless” because it’s good to have some idea of what you’re doing to guide the LLM anyway, but I wish it could do better here.

fragmede Oct 17, 2025

I've got a project that needs to run a special script and not just "make $target" at the command line in order to build, and with instructions in multiple . MD files, codex w/ gpt-5-high still forgets and runs make blindly which fails and it gets confused annoyingly often.

ooh, it does call make when I ask it to compile, and is able to call a couple other popular tools without having to refer to them by name. if I ask it to resize an image, it'll call imagemagik, or run ffmpeg and I don't need to refer to ffmpeg by name.

so at the end of the day, it seems they are their training data, so better write a popular blog post about your one-off MCP and the tools it exposes, and maybe the next version of the LLM will have your blog post in the training data and will automatically know how to use it without having to be told

delaminator Oct 17, 2025

Yeah, I've done this just now.

I installed ImageMagik on Windows.

Created a ".claude/skills/Image Files/" folder

Put an empty SKILLS.md file in it

and told Claude Code to fill in the SKILLS.md file itself with the path to the binaries.

and it created all the instructions itself including examples and troubleshooting

and in my project prompted

"@image.png is my base icon file, create all the .ico files for this project using your image skill"

and it all went smoothly

ChadMoran Oct 16, 2025

It doesn't reliably do it. You need to inject context into the prompt to instruct the LLM to use tools/kb/etc. It isn't deterministic of when/if it will follow-through.

j45 Oct 16, 2025

LLMs are a probability based calculation, so it will always skim to some degree, and always guess to some degree, and often pick the best choice available to it even though it might not be the best.

For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.

blackoil Oct 16, 2025

Most of the experience is general information not specific to project/discussion. LLM starts with all that knowledge. Next it needs a memory and lookup system for project specific information. Lookup in humans is amazingly fast, but even with a slow lookup, LLMs can refer to it in near real-time.

seunosewa Oct 16, 2025

The blurbs can be improved if they aren't effective. You can also invoke skills directly.

The description is equivalent to your short term memory.

The skill is like your long term memory which is retrieved if needed.

These should both be considered as part of the AI agent. Not external things.

larrymcp Oct 17, 2025

> starting from ground zero

You probably mean "starting from square one" but yeah I get you

RicDan Oct 17, 2025

Skills are literally technical documentation for your project it seems. So now we can finally argue for time to write doc, just name it "AI enhancing skill definitions"

SebastianSosa1 Oct 17, 2025

Excellent point, put simply building those preferences and lessons would demand a layer of latent memory, personal models, maybe now is a good time to revisit this idea...

ex3ndr Oct 17, 2025

Humans dont need a skill to know that they need a skill

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous