Comment by maplethorpe

maplethorpe Oct 14, 2025 parent

I read the page you linked, but I'm still not understanding why hallucination isn't an inevitability of LLMs. The explanation OpenAI gives doesn't feel like a complete answer to me.

OpenAI claims that hallucination isn't an inevitability because you can train a model to "abstain" rather than "guess" when giving an "answer". But what does that look like in practice?

My understanding is that an LLM's purpose is to predict the next token in a list of tokens. To prevent hallucination, does that mean it is assigning a certainty rating to the very next token it's predicting? How can a model know if its final answer will be correct if it doesn't know what the tokens that come after the current one are going to be?

Or is the idea to have the LLM generate its entire output, assign a certainty score to that, and then generate a new output saying "I don't know" if the certainty score isn't high enough?

ACCount37 Oct 14, 2025

The answer is that it does know. Not exactly, but the "general shape" of the answer is known to the LLM before the very first token of the answer is emitted!

"Next token prediction" is often overstated - "pick the next token" is the exposed tip of a very large computational process.

And LLMs are very sharp at squeezing the context for every single bit of information available in it. Much less so at using it in the ways you want them to.

There's enough information at "no token emitted yet" for an LLM to start steering the output towards "here's the answer" or "I don't know the answer" or "I need to look up more information to give the answer" immediately. And if it fails to steer it right away? An LLM optimized for hallucination avoidance could still go "fuck consistency drive" and take a sharp pivot towards "no, I'm wrong" mid-sentence if it had to. For example, if you took control and forced a wrong answer by tampering with the tokens directly, then handed the control back to the LLM.

maplethorpe OP Oct 14, 2025

By "shape" of the answer, what do you mean? I always visualized token prediction as a vector pointing off into some sort of cloud of related tokens, and if that's a fair way to visualize it, I could understand how you could say, before even emitting the first token of the answer, "we are pointing towards generally the correct place where the answer is found". But when a single token can make or break an answer, I still don't see how you can truly know whether the answer is correct until the very last token is reached. Because of this, I'm still not convinced hallucination can be stopped.

Can you help correct where I'm going wrong?

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous