Preferences

I think there is still a widespread confusion between two slightly different concepts that the author also tripped over.

If you ask an LLM a question, then get the answer and then ask how it got to that answer, it will make stuff up - because it literally can't do otherwise: There is no hidden memory space in which the LLM could do its calculations, and also record which calculations it did, that it could then consult to answer the second question. All there is are the tokens.

However if you tell the model to "think step by step", I.e. first make a number of small inferences, then use those to derive the final answer, you should (at least in theory) get a high-level description of the actual reasoning process, because the model will use the tokens of its intermediate results to generate the features for the final result.

So "explain how you did it" will give you bullshit, but "think step by step" should work.

And as far as my understanding goes, the "reasoning models" are essentially just heavily fine tuned for step-by-step reasoning.


> that the author also tripped over

The evidence for unfaithful reasoning comes from Anthropic. It is in their system card and this Anthropic paper.

https://assets.anthropic.com/m/71876fabef0f0ed4/original/rea...

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal