If I agree with your definition of hallucinations in the context of LLMs... Then isn't your second paragraph literally just a way to artificially increase the likelihood of them occurring?
You seem to differentiate between a hallucination caused by poisoning the dataset vs a hallucination caused by correct data, but can you honestly make such a distinction considering just how much data goes into these models?
Frankly, hallucination as used with LLMs today is not even really a technical term at all. It literally just means "this particular randomly sampled stream of language produced sentences that communicate falsehoods".
There's a strong argument to be made that the word is actually dangerously misleading by implying that there's some difference between the functioning of a model while producing a hallucinatory sample vs when producing a non-hallucinatory sample. There's not. LLMs produce streams of language sampled from a probability distribution. As an unexpected side effect of producing coherent language these streams will often contain factual statements. Other times the stream contains statements that are untrue. "Hallucination" doesn't really exist as an identifiable concept within the architecture of the LLM, it's just a somewhat subjective judgement by humans of the language stream.
So many mangling of meaning.
Like the “AI” that detects spam is way different than LLMs.
In your example that's just a pollution of the training set by spam, but that's not that much of an issue in practice, as AI has been better than humans at classifying spam for over a decade now.