Comment by falcor84 - Hacker Neue

falcor84 Aug 7, 2024 parent

That's not what "hallucination" is. Hallucinations in LLMs are when they unexpectedly and confidently extrapolate outside of their training set when you expected them to generate something interpolated from their training set.

In your example that's just a pollution of the training set by spam, but that's not that much of an issue in practice, as AI has been better than humans at classifying spam for over a decade now.

ffsm8 Aug 7, 2024

This is confusing to read

If I agree with your definition of hallucinations in the context of LLMs... Then isn't your second paragraph literally just a way to artificially increase the likelihood of them occurring?

You seem to differentiate between a hallucination caused by poisoning the dataset vs a hallucination caused by correct data, but can you honestly make such a distinction considering just how much data goes into these models?

falcor84 OP Aug 7, 2024

Yes, I can make such a distinction - if what the LLM is producing is in the training data then it's not a "hallucination". Note that this is an entirely separate problem from whether the LLM is "correct". In other words, I'm treating the LLM as a Chronicler, summarizing and reproducing what others have previously written, rather than as a Historian trying to determine the underlying truth of what occurred.

lolinder Aug 7, 2024

> Hallucinations in LLMs are...

Frankly, hallucination as used with LLMs today is not even really a technical term at all. It literally just means "this particular randomly sampled stream of language produced sentences that communicate falsehoods".

There's a strong argument to be made that the word is actually dangerously misleading by implying that there's some difference between the functioning of a model while producing a hallucinatory sample vs when producing a non-hallucinatory sample. There's not. LLMs produce streams of language sampled from a probability distribution. As an unexpected side effect of producing coherent language these streams will often contain factual statements. Other times the stream contains statements that are untrue. "Hallucination" doesn't really exist as an identifiable concept within the architecture of the LLM, it's just a somewhat subjective judgement by humans of the language stream.

dartos Aug 7, 2024

There’s just so much wrong here.

So many mangling of meaning.

Like the “AI” that detects spam is way different than LLMs.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous