Preferences

There are keys to attend to, they're just in the original text instead of the modified one. Since the model receives both as input, it could theoretically attend to those keys.

For the attention mechanism, there isn't much difference between

  Original: {shared prefix} {removed part} {shared suffix} Modified: {shared prefix} {shared suffix}
And

  Original: {shared prefix} {shared suffix} Modified: {shared prefix} {added part} {shared suffix}
I think you could implement an algorithm for this in RASP (a language for manually programming transformers) roughly like this:

1. The first layer uses attention to the "Original:" and "Modified:" tokens to determine whether the current token is in the original or modified parts.

2. The second layer has one head attend equally to all original tokens, which averages their values, and another head attends equally to all modified tokens, averaging them as well. The averages are combined by computing their difference.

3. The third layer attends to tokens that are similar to this difference, which would be the ones in the {removed part}/{added part}.

The only ordering-dependent part is whether you compute the difference as original_average - modified_average or the other way around.

If a model can detect additions but not removal, that would show that it is capable of learning this or a similar algorithm in principle, but wasn't trained on enough removal-style data to develop the necessary circuitry.


ironmanszombie
Thanks for the breakdown. I am far from knowledgeable on AI but was wondering why can't a simple comparison work? They can definitely be coded, as you have beautifully demonstrated.
yorwba OP
A simple comparison between which two vectors?

This item has no comments currently.