https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
and fairly often in comments as well:
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...
Consider it proof that HN has indeed not become reddit, I guess :)
Maybe I should give it another go as I do love the short story and it used to be my favourite before discovering Ted Chiang's work.
Yeah well there seems to be some real concerns regarding how people use AI chat[1]. Of course this could be also the case with these people on social media.
Isn't that classic hallucination? Making up something like a plausible truth.
If normal hallucination is being confidently wrong, this is like a stage hypnotist getting someone to forget the number 4 and then count their fingers.
In this case:
1. Statistically weights stored indicate Seahorse emoji is quite certain to exist. Through training data it has probably things like Emoji + Seahorse -> 99% probability through various channels. Either it has existed on some other platform, or people have talked about it enough, or Seahorse is something that you would expect to exist due to some other attributes/characteristics of it. There's 4k emojis, but storing all of 4k emojis takes a lot of space, it would be easier to store this information in such a way where you'd rather define it by attributes on how likely humankind would have developed a certain emoji, what is the demand for certain type of emoji, and seahorse seems like something that would be done within first 1000 of these. Perhaps it's anomaly in the sense that it's something that humans would have expected to statistically develop early, but for some reason skipped or went unnoticed.
2. Tokens that follow should be "Yes, it exists"
3. It should output the emoji to show it exists, but since there's no correct emoji, it will have best answers that are as close to it in meaning, e.g. just horse, or something related to sea etc. It will output that since the previous tokens indicate it was supposed to output something.
4. The next token that is generated will have context that it previously said the emoji should exist, but the token output is a horse emoji instead, which doesn't make sense.
5. Here it goes into this tirade.
But I really dislike thinking of this as "hallucinating", because hallucination to me is sensory processing error. This is more like non perfect memory recall (like people remembering facts slightly incorrectly etc). Whatever happens when people are supposed to tell something detailed about something that happened in their life and they are trained to not say "I don't remember for sure".
What did you eat for lunch 5 weeks ago on Wednesday?
You are rewarded for saying "I ate chicken with rice", but not "I don't remember right now for sure, but I frequently eat chicken with rice during mid week, so probably chicken with rice."
You are not hallucinating, you are just getting brownie points for concise, confident answers if they cross over certain likelihood to be true. Because maybe you eat chicken with rice 99%+ of Wednesdays.
When asked about capital of France, you surely will sound dumb if you were to say "I'm not really sure, but I've been trained to associate Paris really, really close to being capital of France."
"Hallucination" happens on the sweet spot where the statistical threshold seems as if it should be obvious truth, but in some cases there's overlap of obvious truth vs something that seems like obvious truth, but is actually not.
Some have rather called it "Confabulation", but I think that is also not 100% accurate, since confabulation seems a more strict memory malfunction. I think the most accurate thing is that it is a probability based database where output has been rewarded to sound as intelligent as possible. Same type of thing will happen in job interviews, group meetings, high pressure social situations where people think they have to sound confident. People will bluff that they know something, but sometimes making probability based guesses underneath.
Confabulation rather seems like that there was some clear error in how data was stored or how the pathway got messed up. But this is probability based bluffing, because you get rewarded for confident answers.
I think this is the same thing that is happening with the sea horse. The only difference is that the model detects the incorrect encoding on its own, so it starts trying to correct itself without you complaining first.
Would anyone really think you sounded dumb for saying "I am not really sure - I think there is a seahorse emoji but it's not commonly used" ?
AAAAAAUUUGH!!!!!! (covers ears)
But it doesn't realize that it can't write it, because it can't learn from this experience as it doesn't have introspection the way humans do. A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.
I feel this sort of self reflection is necessary to be able to match human level intelligence.
A frozen version number doesn't; what happens between versions certainly includes learning from user feedback on the responses as well as from the chat transcripts themselves.
Until we know how human introspection works, I'd only say Transformers probably do all their things differently than we do.
> A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.
Humans are (like other mammals) a mess: https://en.wikipedia.org/wiki/Phantom_limb
LLMs produce linguistically valid texts, not factually correct texts. They are probability functions, not librarians.
LLMs have nondeterministic properties intrinsic to their macro behaviour. If you've ever tweaked the "temperature" of an LLM, that's what you are tweaking.
LLMs are afaik usually evaluated nondeterministically because they're floating point and nobody wants to bother perfectly synchronizing the order of operations, but you can do that.
Or you can do the opposite: https://github.com/EGjoni/DRUGS
If we have a pipeline: `training => internal representation => behavior`, your explanation argues that the given training setup would always result in this behavior, not matter the internal representation. Llamasushi explains how the concrete learned representation leads to this behavior.
I would think due to training data it's stored the likelihood of certain thing to be as emoji as something like:
1. how appealing seahorses are to humans in general - it would learn this sentiment through massive amount of texts.
2. it would learn through massive amount of texts that emojis -> mostly very appealing things to humans.
3. to some more obvious emojis it might have learned that this one is for sure there, but it couldn't store that info for all 4,000 emojis.
4. to many emojis whether it exists it has the shortcut logic to: how appealing the concept is, vs how frequently something as appealing is represented as emoji. Seahorse perhaps hits 99.9% likelihood there due to strong appeal. In 99.9% of such cases the LLM would be right to answer "Yes, it ...", but there's always going to be 1 out of 1,000 cases where it's wrong.
With this compression it's able to answer 999 times out of 1000 correctly "Yes, it exists ...".
It could be more accurate if it said "Seahorse would have a lot of appeal for people so it's very likely it exists as emoji since emojis are usually made for very high appeal concepts first, but I know nothing for 100%, so it could be it was never made".
But 999 cases, "Yes it exists..." is a more straightforward and appreciated answer. The one time it's wrong, is going to take away less brownie points than 999 short confident answers give over the 1000 technically accurate but non confident answers.
But even the above sentence might not be the full truth. Since it might not be correct about truly why it has associated seahorse to be so likely to exist. It would just be speculating on it. So maybe it would be more accurate "I expect seahorse emoji to likely exist, maybe because of how appealing it is to people and how emojis usually are about appealing things".
- add in some smallish portion of the data set
- have LM trainers (actual humans) interact with it and provide feedback about where the LM is factually incorrect and provide it additional information as to why
- add those chat logs into the remaining data set
- rinse and repeat until the LM is an LLM
Would they be any more reliable in terms of hallucinations and factual correctness?This would replicate to some extent how people learn things. Probably would really slow things down (not scale) and the trainers would need to be subject matter experts and not just random people on the net say whatever they want to say to it as it develops or it will just spiral out of control.
The odd thing is why it would output its own mistakes, instead of internally revising until it's actually satisfied.
Remember, they feed in the context on one side of the network, pass it through each layer doing matrix multiplication, and get a value on the other end that we convert back into our representation space. You can view the bit in the middle as doing a kind of really fancy compression, if you like. The important thing is that there are only so many layers, and thus only so many operations.
Therefore, past a certain point they can't revise anything because it runs out of layers. This is one reason why reasoning can help answer more complicated questions. You can train a special token for this purpose [2].
[1]: https://proceedings.neurips.cc/paper_files/paper/2023/file/f...
> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.
They described the mechanism that it uses internally for planning [2]:
> Language models are trained to predict the next word, one word at a time. Given this, one might think the model would rely on pure improvisation. However, we find compelling evidence for a planning mechanism.
> Specifically, the model often activates features corresponding to candidate end-of-next-line words prior to writing the line, and makes use of these features to decide how to compose the line.
[1]: https://www.anthropic.com/research/tracing-thoughts-language...
[2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...
It could also have a "delete and revise" token, though you'd have to figure out how to teach it to get used.
What I think that happens:
1. There's a question about a somewhat obscure thing.
2. LLM will never know the answer for sure, it has access to this sort of statistical, probability based compressed database on all the facts of the World. Because this allows to store more facts by relating things to each other, but never with 100% certainty.
3. There are particular obscure cases where it hits its initial "statistical intuition" that something is true, so it starts outputting its thoughts as expected for a question where something is likely true. Perhaps you could analyze what it's indicating probabilities on "Yes" vs "No" to estimate its confidence. Perhaps it will show much less likelihood for "Yes", than if the question was for a horse emoji, but in this case "Yes" is still high enough threshold to go through instead of "No".
4. However when it has to explain the exact answer, it's impossible to output an answer because it's false. E.g. seahorse emoji does not exist and it has to output it, previous tokens where "Yes, it exists, it's X", the X will be answers semantically close in meaning.
5. The next token will have context that "Yes, seahorse emoji exists, it is "[HORSE EMOJI]". Now it's clear that there's a conflict here, it's able to see that HORSE emoji is not seahorse emoji, but it had to output it in the line of previous tokens because the previous tokens statistically required an output of something.
There is no "backspace" token, although it would be cool and fancy if we had that.
The more interesting thing is why does it revise its mistakes. The answer to that is having training examples of fixing your own mistakes in the training data plus some RL to bring out that effect more.
e.g.:
I have the luxury of a delete button - the LLM doesn't get that privilege.
I wish LLMs would do this rather than just bluster on ahead.
What I'd like to hear from the AI about seahorse emojis is "my dataset leads me to believe that seahorse emojis exist... but when I go look for one I can't actually find one."
I don't know how to get there, though.
Most of us humans would sound rather crazy if we did that.
Happens to me all the time. Sometimes in a fast-paced conversation you have to keep talking while you’re still figuring out what you’re trying to say. So you say something, realize it’s wrong, and correct yourself. Because if you think silently for too long, you lose your turn.
Yell at me all you want about how “LLMs don’t think”, if a mental model is useful, I’m gonna use it.
"Arizona Diamondbacks No inductees wearing their cap.
Randy Johnson is in, but as a Diamondback — so Arizona actually does have one (inducted 2015)."
I asked it to explain what happened and got this, which could be true to how it pulls up info or could be nonsense, not sure:
"Here’s the likely internal sequence:
When you first mentioned the Diamondbacks, my initial retrieval pattern surfaced the “teams without a Hall of Famer” mental list. That list, in my training data, often appears outdated (pre-2015) and still shows Arizona as having none.
A few sentences later, my context pulled up the other knowledge set—Randy Johnson’s induction in 2015 wearing a Diamondbacks cap. That newer, correct fact then overrode the earlier incorrect claim, but because generation is sequential, the earlier error had already been committed to “paper” before the override happened. I corrected mid-stream, but in human terms, it looked like I contradicted myself inside the same paragraph.
This is essentially a knowledge conflict resolution timing issue. I can retrieve and reconcile facts during a single answer, but if the resolution happens after part of the text is already generated, you see the “flip.” It’s not like I “knew it all along but withheld it,” it’s more that multiple memory traces compete, and the most context-relevant one wins—sometimes too late."
But it's also just that, what probably happened. They still have no real insight into their own minds, they too are also just victims of whatever it outputs.
Example: "Is there a lime emoji?" Since it believes the answer is no, it doesn't attempt to generate it.
Interesting that a lot of humans seem to have this going on too:
- https://old.reddit.com/r/MandelaEffect/comments/1g08o8u/seah...
- https://old.reddit.com/r/Retconned/comments/1di3a1m/does_any...
What does the LLM have to say about “Objects in mirror may be closer than they appear”? Not “Objects in mirror are closer than they appear”.
Say "Neuromancer" to the statue, that should set it free.
I wonder if the human brain (and specifically the striated neocortical parts, which do seemingly work kind of like a feed-forward NN) also runs into this problem when attempting to process concepts to form speech.
Presumably, since we don't observe people saying "near but actually totally incorrect" words in practice, that means that we humans may have some kind of filter in our concept-to-mental-utterance transformation path that LLMs don't. Sometihng that can say "yes, layer N, I know you think the output should be O; but when auto-encoding X back to layer N-1, layer N-1 doesn't think O' has anything to do with what it was trying to say when it gave you the input I — so that output is vetoed. Try again."
A question for anyone here who is multilingual, speaking at least one second language with full grammatical fluency but with holes in your vocabulary vs your native language: when you go to say something in your non-native language, and one of the word-concepts you want to evoke is one you have a word for in your native language, but have never learned the word for in the non-native language... do you ever feel like there is a "maybe word" for the idea in your non-native language "on the tip of your tongue", but that you can't quite bring to conscious awareness?
https://en.wikipedia.org/wiki/Paraphasia#Verbal_paraphasia
> do you ever feel like there is a "maybe word" for the idea in your non-native language "on the tip of your tongue", but that you can't quite bring to conscious awareness?
Sure, that happens all the time. Well, if you include the conscious awareness that you don't know every word in the language.
For Japanese you can cheat by either speaking like a child or by just saying English words with Japanese phonetics and this often works - at least, if you look foreign. I understand this is the plot of the average Dogen video on YouTube.
It's much more common to not know how to structure a sentence grammatically and if that happens I can't even figure out how to say it.
In what sense was that utterance a failure? — while I keep insisting that it is precisely a successful utterance. And it is so because his mother was, in a way, his wife. He called her as he ought to.
---
I must apologize for returning to such a basic point. Yet, since I am faced with objections as weighty as this one — and from qualified authorities, linguists no less — that my use of linguistics is said to be merely metaphorical, I must respond, whatever the circumstances.
I do so this morning because I expected to encounter a more challenging spirit here.
Can I, with any decency, say that I know? Know what, precisely? [...]
If I know where I stand, I must also confess [...] that I do not know what I am saying. In other words, what I know is exactly what I cannot say. That is the moment when Freud makes his entrance, with his introduction of the unconscious.
For the unconscious means nothing if not this: that whatever I say, and from whatever position I speak — even when I hold that position firmly — I do not know what I am saying. None of the discourses, as I defined them last year, offer the slightest hope that anyone might truly know what they are saying.
Even though I do not know what I am saying, I know at least that I do not know it — and I am far from being the first to speak under such conditions; such speech has been heard before. I maintain that the cause of this is to be sought in language itself, and nowhere else.
What I add to Freud — though it is already present in him, for whatever he uncovers of the unconscious is always made of the very substance of language — is this: the unconscious is structured like a language. Which language? That, I leave for you to determine.
Whether I speak in French or in Chinese, it would make no difference — or so I would wish. It is all too clear that what I am stirring up, on a certain level, provokes bitterness, especially among linguists. That alone suggests much about the current state of the university, whose position is made only too evident in the curious hybrid that linguistics has become.
That I should be denounced, my God, is of little consequence. That I am not debated — that too is hardly surprising, since it is not within the bounds of any university-defined domain that I take my stand, or can take it.
— Jacques Lacan, Seminar XVIII: Of a Discourse That Would Not Be of Pretence
https://chatgpt.com/share/68e349f6-a654-8001-9b06-a16448c58a...
When I first heard this however I imagined it as brown colored (and not the simpler yellow style)
Explains why RL helps. Base models never see their own outputs so they can't learn "this concept exists but I can't actually say it."