Comment by Workaccount2

Workaccount2 Jun 18, 2025 parent

They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot will start to become apparent around 100k tokens (with Gemini 2.5).

They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.

Right now I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance.

simonw Jun 18, 2025

I think you just coined "context rot", what an excellent term! Quoted you on my blog https://simonwillison.net/2025/Jun/18/context-rot/

lovich Jun 19, 2025

I don’t know why, but going out of your way to make sure the coining of this is attributed to a random user on the internet made me incredibly nostalgic for what the pre Web 2.0 internet was like sans the 4chans, liveleak, and their forebears on usenet

gcr Jun 19, 2025

maybe someday soon, LLMs will learn to smoothly forget their own irrelevant context

imagine instead of predicting just the next token, the LLM predicts a mask over the previous tokens, that is then thresholded and only “relevant” tokens are kept in the next inference

one key distinction between humans and LLMs is that humans are excellent at forgetting irrelevant data. we forget tens of times a second and only keep what's necessary

bhl Jun 18, 2025

I always referred it more as context degradation, but rot is more visceral.

jerpint Jun 19, 2025

History in the making

hnhn34 Jun 19, 2025

Include me in the screenshot.

codeflo Jun 18, 2025

I wonder to what extent this might be a case where the base model (the pure token prediction model without RLHF) is "taking over". This is a bit tongue-in-cheek, but if you see a chat protocol where an assistant makes 15 random wrong suggestions, the most likely continuation has to be yet another wrong suggestion.

People have also been reporting that ChatGPT's new "memory" feature is poisoning their context. But context is also useful. I think AI companies will have to put a lot of engineering effort into keeping those LLMs on the happy path even with larger and larger contexts.

potatolicious Jun 18, 2025

I think this is at least somewhat true anecdotally. We do know that as context length increases, adherence to the system prompt decreases. Whether that de-adherence is reversion to the base model or not I'm not really qualified to say, but it certainly feels that way from observing the outputs.

Pure speculation on my part but it feels like this may be a major component of the recent stories of people being driven mad by ChatGPT - they have extremely long conversations with the chatbot where the outputs start seeming more like the "spicy autocomplete" fever dream creative writing of pre-RLHF models, which feeds and reinforces the user's delusions.

Many journalists have complained that they can't seem to replicate this kind of behavior in their own attempts, but maybe they just need a sufficiently long context window?

steveklabnik Jun 18, 2025

> They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.

In Claude Code you can use /clear to clear context, or /compact <optional message> to compact it down, with the message guiding what stays and what goes. It's helpful.

libraryofbabel Jun 18, 2025

Also in Claude Code you can just press <esc> a bunch of times and you can backtrack to an earlier point in the history before the context was poisoned, and re-start from there.

Claude has some amazing features like this that aren’t very well documented. Yesterday I just learned it writes sessions to disk and you can resume them where you left off with -continue or - resume if you accidentally close or something.

drewnick Jun 18, 2025

Thank you! This just saved me after closing laptop and losing a chat in VS Code. Cool feature and always a place where Clause Code UX was behind chat - being able to see history. "/continue" saved me ~15 minutes of re-establishing the planning for a new feature.

Also loving the shift + tab (twice) to enter plan mode. Just adding here in case it helps anyone else.

Aeolun Jun 18, 2025

Claude code should really document this stuff in some kind of tutorial. There’s too much about code that I need to learn from random messages on the internet.

steveklabnik Jun 18, 2025

> Claude has some amazing features like this that aren’t very well documented.

Yeah, it seems like they stealth ship a lot. Which is cool, but can sometimes lead to a future that's unevenly distributed, if you catch my drift.

no_wizard Jun 18, 2025

I feel like as an end user I’d like to be able to do more to shape the LLM behavior. For example, I’d like to flag the dead end paths so they’re properly dropped out of context and not explored again, unless I as a user clear the flag(s).

I know there is work being done on LLM “memory” for lack of a better term but I have yet to see models get more responsive over time with this kind of feedback. I know I can flag it but right now it doesn’t help my “running” context that would be unique to me.

I have a similar thought about LLM “membranes”, which combines the learning from multiple users to become more useful, I am keeping a keen eye on that as I think that will make them more useful on a organizational level

nomel Jun 19, 2025

Any good chat client will let you not only modify previous messages in place, but also modify the LLM responses, and regenerate from any point.

xwolfi Jun 19, 2025

At some point, shouldn't these things start understanding what they're doing ?

joshstrange 4 days ago

Great term, I never put it into words but I feel this deeply. I rarely go back and forth more than 2-3 times with an LLM before ejecting to a new conversation. I’ve just been burned so much by old context informing the conversation later to my detriment. As soon as it gets something wrong I know the “rot” has set in and I need to start over (bringing over the best parts).

But, OpenAI and friends should let me purge my questions and, more importantly, the LLM response from the chat. More often than not, it’s poisoning itself with bad ideas, flip-flopping, etc. I hate having to pick up and move to a new chat but if I don’t the conversation will only go downhill.

voxelghost Jun 19, 2025

I often think that context should be a tree, where you can say - lets prune this whole branch, it didn't lead to anything good.

darepublic Jun 18, 2025

There are just certain problems that they cannot solve. Usually when there is no clear example in its pretraining or discoverable on the net. I would say the reasoning capabilities of these models are pretty shallow, at least it seems that way to me

dingnuts Jun 18, 2025

They can't reason at all. The language specification for Tcl 9 is in the training data of the SOTA models but there exist almost no examples, only documentation. Go ahead, try to get a model to write Tcl 9 instead of 8.5 code and see for yourself. They can't do it, at all. They write 8.5 exclusively, because they only copy. They don't reason. "reasoning" in LLMs is pure marketing.

nomel Jun 19, 2025

It becomes clear that it's just statistics once you get near a statistically significant "attractor".

A silly example is any of the riddles where you just simplify it to an obvious degree and the LLM can't get it (mostly gone with recent big models), like: "A man, a sheep, and a boat need to get across a river. How can they do this safely without the sheep being eaten".

A more practically infuriating example is when you want to do something slightly different than a very common problem. The LLM might eventually get it right, after too much guidance, but then it'll slowly revert back to the "common" case. For example, replacing whole chunks of code with whatever common thing when you tell it add comments. This happens frequently to me with super basic vector math.

kossae Jun 18, 2025

This is my experience as well, and for now comes down to a workflow optimization. As I feel the LLM getting off track, I start a brand new session with useful previous context pasted in from my previous session. This seems to help steer it back to a decent solution, but agreed it would be nice if this was more automated based off of user/automated feedback (broken unit test, "this doesn't work", etc.)

kazinator Jun 18, 2025

"Human Attention to the Right Subset of the Prior Context is All You Need"

HeWhoLurksLate Jun 18, 2025

I've found issues like this happen extremely quickly with ChatGPT's image generation features - if I tell it to put a particular logo in, the first iteration looks okay, while anything after that starts to look more and more cursed / mutant.

nojs Jun 18, 2025

https://www.astralcodexten.com/p/the-claude-bliss-attractor

rvnx Jun 18, 2025

I've noticed something, even if you ask to edit a specific picture, it will still used the other pictures in the context (and this is somewhat unwanted)

vunderba Jun 18, 2025

gpt-image-1 is unfortunately particular vulnerable to this problem. The more you want to change the initial image - the better off you'd honestly be just starting an entirely new conversation.

OtherShrezzing Jun 18, 2025

>They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.

This is possible in tools like LM Studio when running LLMs locally. It's a choice by the implementer to grant this ability to end users. You pass the entire context to the model in each turn of the conversation, so there's no technical reason stopping this feature existing, besides maybe some cost benefits to the inference vendor from cache.

5mv2 Jun 19, 2025

Context summarization will be natively added soon.

It's already the case on tools like block.github.io/goose:

```

Summarize Conversation This will summarize your conversation history to save context space.

Previous messages will remain visible but only the summary will be included in the active context for Goose. This is useful for long conversations that are approaching the context limit.

```

iLoveOncall Jun 18, 2025

> They really need to figure out a way to delete or "forget" prior context

This is already pretty much figured out: https://www.promptingguide.ai/techniques/react

We use it at work and we never encounter this kind of issues.

barapa Jun 18, 2025

How does ReAct address this? Unless one of the actions is deleting part of the message history...

qwertox Jun 19, 2025

My same experience with Gemini 2.5.

It mostly happens when you pass it similar but updated code, for some reason it then doesn't really see the newest version and reasons over obsolete content.

I've had one chat recover from this, though.

dr_dshiv Jun 18, 2025

Context rot! Love it. One bad apple spoils the barrel.

I try to keep hygiene with prompts; if I get anything bad in the result, I try to edit my prompts to get it better rather than correcting in conversation.

rxzzh Jun 19, 2025

What a good concept! Maybe there will be researchs focus on let LLM effectively forget things!

autobodie Jun 18, 2025

If so, that certainly fits with my experiences.

eplatzek Jun 18, 2025

Honestly that feels a like a human.

After hitting my head against a wall with a problem I need to stop.

I need to stop and clear my context. Go a walk. Talk with friends. Switch to another task.

This item has no comments currently.