Comment by cluckindan

cluckindan Sep 10, 2025 parent

It’s an LLM that has been trained and prompted to make users believe that the model is using logical reasoning to arrive at its output, when it is in fact still predicting the possible next output tokens, just like any other LLM.

There may be additional feedback loops, but fundamentally, that is what it is doing. Sure, it will show you what steps it takes to arrive at a conclusion, but it is just predicting the steps, the conclusion and the potential validity of the aforementioned based on its training data, not actually evaluating the logic or the truthiness of the output.

If you don’t believe me, ask your ”reasoning” LLM this question: What’s the name of the paternal great-great-grandfather of the son of Jacob’s son’s son’s son?

BrawnyBadger53 Sep 10, 2025

Or to write it less pessimistically, the models are trained to prime their own context window such that by the end of the chain they arrive at more valuable responses. By creating intermediary steps in the chain, the next step is easier to generate rather than moving directly to the desired response. We call it reasoning because it is intuitively analogous to human reasoning methods though it is understood that LLMs don't succeed as generally as humans are able to.

mvdwoord Sep 10, 2025

Progress is hard to keep track of in this fast paced environment, but aren't there already models that can add external tools and simply offload parts of he reasoning there? Maybe over MCP or some other mechanism, so it can offload e.g. calculations, or test code in a sandbox, or even write code to answer part of a question, execute the code somewhere, and take the results into the rest of the inference process as context?

Or is there a more subtle issue which prevents or makes this hard?

Is there something fundamentally impossible about having a model detecting the amount of Rs in 'strawberry' to be a string search operation and in some sandbox execute something like:

% echo "strawberry" | tr -dc "r" | wc -c

It seems agents do this already, but regular GPT style environments seem to lack it?

yunohn Sep 10, 2025

My observation of AI progress over the past 2yrs has shown that LLM companies are focusing purely on raw model knowledge instead of optimised usable tooling. Unsure when this will ever change, but that’s why your example is not the industry’s standard yet.

mvdwoord Sep 10, 2025

My intuition, which is of course woefully inadequate in this area, says there is a ton of accuracy to be gained, and I feel also a lot of offloading and therefore pruning or better use for the rest of the parameters...

Anyway,. let me refresh my page, as I am sure while typing this some new model architecture is dropping. ;)

Varelion Sep 10, 2025

Let's break this down carefully, step by step.

Start with Jacob.

Jacob’s son → call him A.

A’s son → call him B.

B’s son → call him C.

C’s son → call him D (this is “the son of Jacob’s son’s son’s son”).

Now the question asks for the paternal great-great-grandfather of D:

D’s father → C

D’s grandfather → B

D’s great-grandfather → A

D’s great-great-grandfather → Jacob

Answer: Jacob

sindriava Sep 10, 2025

I won't read this because you're not really thinking, just pressing keyboard keys.

cluckindan OP Sep 10, 2025

Joke’s on you, I dictated it.

sindriava Sep 10, 2025

Rich coming from the guy who moved his muscles until sounds came out.

Also next time you should bother to at least copy paste your questions into any recent LLM, since they can all solve it without issue. But hallucations like this are common with non-reasoning HN users.

cluckindan OP Sep 10, 2025

But can they solve it without referring to the Bible, or without mentioning anyone in the biblical Jacob’s family tree?

Don’t think so. Humans solve that puzzle in a very different way than LLMs ”reason” about it.

astrange Sep 10, 2025

GPT5 and DeepThink both solved it without doing that for me, yes.

(DeepThink did wonder if it was supposed to be him afterwards or if it was a trick.)

cluckindan OP Sep 11, 2025

Yesterday, GPT5 was producing Bible answers. I guess the developers are lurking here. :-)

Adding a second question like ”Is Abraham included in the family tree?” still makes it regress into mentioning Isaac, Judah, Joseph, 12 sons and whatnot.

nerpderp82 Sep 10, 2025

There can be more than one intelligence. Nature has shown us that there are many. And many which can "outsmart" a human.

freejazz Sep 10, 2025

Thank you, I do not have a "reasoning" LLM and I have not found LLMs very useful in my life so I do not really engage with them outside of reading about them here and in other places.

frozenseven Sep 10, 2025

This isn't an explanation. Just another "AI bad!" comment.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous