imagine instead of predicting just the next token, the LLM predicts a mask over the previous tokens, that is then thresholded and only “relevant” tokens are kept in the next inference
one key distinction between humans and LLMs is that humans are excellent at forgetting irrelevant data. we forget tens of times a second and only keep what's necessary
People have also been reporting that ChatGPT's new "memory" feature is poisoning their context. But context is also useful. I think AI companies will have to put a lot of engineering effort into keeping those LLMs on the happy path even with larger and larger contexts.
Pure speculation on my part but it feels like this may be a major component of the recent stories of people being driven mad by ChatGPT - they have extremely long conversations with the chatbot where the outputs start seeming more like the "spicy autocomplete" fever dream creative writing of pre-RLHF models, which feeds and reinforces the user's delusions.
Many journalists have complained that they can't seem to replicate this kind of behavior in their own attempts, but maybe they just need a sufficiently long context window?
In Claude Code you can use /clear to clear context, or /compact <optional message> to compact it down, with the message guiding what stays and what goes. It's helpful.
Claude has some amazing features like this that aren’t very well documented. Yesterday I just learned it writes sessions to disk and you can resume them where you left off with -continue or - resume if you accidentally close or something.
Also loving the shift + tab (twice) to enter plan mode. Just adding here in case it helps anyone else.
I know there is work being done on LLM “memory” for lack of a better term but I have yet to see models get more responsive over time with this kind of feedback. I know I can flag it but right now it doesn’t help my “running” context that would be unique to me.
I have a similar thought about LLM “membranes”, which combines the learning from multiple users to become more useful, I am keeping a keen eye on that as I think that will make them more useful on a organizational level
But, OpenAI and friends should let me purge my questions and, more importantly, the LLM response from the chat. More often than not, it’s poisoning itself with bad ideas, flip-flopping, etc. I hate having to pick up and move to a new chat but if I don’t the conversation will only go downhill.
A silly example is any of the riddles where you just simplify it to an obvious degree and the LLM can't get it (mostly gone with recent big models), like: "A man, a sheep, and a boat need to get across a river. How can they do this safely without the sheep being eaten".
A more practically infuriating example is when you want to do something slightly different than a very common problem. The LLM might eventually get it right, after too much guidance, but then it'll slowly revert back to the "common" case. For example, replacing whole chunks of code with whatever common thing when you tell it add comments. This happens frequently to me with super basic vector math.
This is possible in tools like LM Studio when running LLMs locally. It's a choice by the implementer to grant this ability to end users. You pass the entire context to the model in each turn of the conversation, so there's no technical reason stopping this feature existing, besides maybe some cost benefits to the inference vendor from cache.
It's already the case on tools like block.github.io/goose:
```
Summarize Conversation This will summarize your conversation history to save context space.
Previous messages will remain visible but only the summary will be included in the active context for Goose. This is useful for long conversations that are approaching the context limit.
```
This is already pretty much figured out: https://www.promptingguide.ai/techniques/react
We use it at work and we never encounter this kind of issues.
It mostly happens when you pass it similar but updated code, for some reason it then doesn't really see the newest version and reasons over obsolete content.
I've had one chat recover from this, though.
I try to keep hygiene with prompts; if I get anything bad in the result, I try to edit my prompts to get it better rather than correcting in conversation.
They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.
Right now I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance.