Preferences

roywiggins parent
LLMs will write out explanations that are entirely post-hoc:

> Strikingly, Claude seems to be unaware of the sophisticated "mental math" strategies that it learned during training. If you ask how it figured out that 36+59 is 95, it describes the standard algorithm involving carrying the 1. This may reflect the fact that the model learns to explain math by simulating explanations written by people, but that it has to learn to do math "in its head" directly, without any such hints, and develops its own internal strategies to do so.

https://www.anthropic.com/news/tracing-thoughts-language-mod...

It seems to be about as useful as asking a person how their hippocampus works: they might be able to make something up, or repeat a vaguely remembered bit of neuroscience, but they don't actually have access to their own hippocampus' internal workings, so if they're correct it's by accident.


milesrout (dead)

This item has no comments currently.