Comment by xg15 - Hacker Neue

xg15 Apr 24, 2025 parent

I think there is still a widespread confusion between two slightly different concepts that the author also tripped over.

If you ask an LLM a question, then get the answer and then ask how it got to that answer, it will make stuff up - because it literally can't do otherwise: There is no hidden memory space in which the LLM could do its calculations, and also record which calculations it did, that it could then consult to answer the second question. All there is are the tokens.

However if you tell the model to "think step by step", I.e. first make a number of small inferences, then use those to derive the final answer, you should (at least in theory) get a high-level description of the actual reasoning process, because the model will use the tokens of its intermediate results to generate the features for the final result.

So "explain how you did it" will give you bullshit, but "think step by step" should work.

And as far as my understanding goes, the "reasoning models" are essentially just heavily fine tuned for step-by-step reasoning.

13years Apr 24, 2025

> that the author also tripped over

The evidence for unfaithful reasoning comes from Anthropic. It is in their system card and this Anthropic paper.

https://assets.anthropic.com/m/71876fabef0f0ed4/original/rea...

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous