Preferences

Interesting. Even the most recent models perform relatively poorly when asked to identify which information in a context has been removed, given access to both the original and edited contexts.

The authors posit that poor performance is due to the fact that the attention mechanism of Transformers cannot attend to the removed tokens, because there are no keys for them!

Thank you for sharing on HN.


yorwba
There are keys to attend to, they're just in the original text instead of the modified one. Since the model receives both as input, it could theoretically attend to those keys.

For the attention mechanism, there isn't much difference between

  Original: {shared prefix} {removed part} {shared suffix} Modified: {shared prefix} {shared suffix}
And

  Original: {shared prefix} {shared suffix} Modified: {shared prefix} {added part} {shared suffix}
I think you could implement an algorithm for this in RASP (a language for manually programming transformers) roughly like this:

1. The first layer uses attention to the "Original:" and "Modified:" tokens to determine whether the current token is in the original or modified parts.

2. The second layer has one head attend equally to all original tokens, which averages their values, and another head attends equally to all modified tokens, averaging them as well. The averages are combined by computing their difference.

3. The third layer attends to tokens that are similar to this difference, which would be the ones in the {removed part}/{added part}.

The only ordering-dependent part is whether you compute the difference as original_average - modified_average or the other way around.

If a model can detect additions but not removal, that would show that it is capable of learning this or a similar algorithm in principle, but wasn't trained on enough removal-style data to develop the necessary circuitry.

ironmanszombie
Thanks for the breakdown. I am far from knowledgeable on AI but was wondering why can't a simple comparison work? They can definitely be coded, as you have beautifully demonstrated.
yorwba
A simple comparison between which two vectors?
cyanydeez
for vision models, I wonder if they can train on things like photo negatives, rotated images, etc. Or madlib like sentences where a Q/A is like "the _____ took first place in the horse show."
bearseascape
The madlib like sentences approach is actually how masked token prediction works! It was one of the pretraining tasks for BERT, but nowadays I think all (?) LLMs are trained with next token prediction instead.
latency-guy2
For photo negatives - usually doesn't matter. I am not up to date with what the vision folks are doing at these companies, but images are usually single channel, and more likely than not for regular images in greyscale. Otherwise in complex domain for the radar folks, and those are not RGB based images at all, rather scatterer defined.

Additional channels being recognized in training usually didn't matter for the experiments and models I used to deal with before 2022, and if they were, certainly did not matter for colors. Then again, the work I was doing was on known (and some additional confusers) classes for object detection and classification where the color pretty much didn't matter in the first place.

usaar333
They don't seem to use any recent top models. No opus, no o3, no Gemini 25 pro
cs702 OP
It seems they used the most recent models available as of March 2025.
And yet, there are some notable differences between them, so now that there’s a benchmark and attention given to this issue, I wonder how much better they can get. Because obviously something can be done.

This item has no comments currently.