Comment by b0a04gl - Hacker Neue

b0a04gl Jun 21, 2025 parent

why are we surprised transformers can't detect what's missing when the entire stack assumes the input is complete? the tokenizer doesn't leave placeholders. the attention weights have nothing to anchor to. even the loss function is built around predicting what is, not what isn't. this isn’t a model bug. it’s an architectural omission.

if we want models that detect absences? you need training objectives that expect absence. maybe even input encodings that represent "this might've been here."

zurfer Jun 21, 2025

I am surprised because it's such a simple task. Any human who is a bit diligent would be able to figure it out. They give both the original and the modified version.

However it feels a bit like counting letters. So maybe it can be solved with post training. We'll know in 3 to 6 months if it was easy for the labs to "fix" this.

In my daily use of LLMs I regularly have some overly optimistic answers because they fail to consider potentially absent or missing information (even harder because it's out of context).

This item has no comments currently.