In this benchmark they give the LLM the necessary information to determine what is missing. For example “here is a poem, here is a version of that same poem that may or may not be missing lines. Are any lines missing?
It’s more a tuning issue IMHO than an inherent weakness in LLMs.
If I was asked to find an omission in an ML paper, my brain compares it with other ML papers, it does not need to compare it to Star Ward, Top Gear, Greek history, pottery and the other 1000s of contexts I may know about.
That is still hard. You only have so many attention heads looking for things.. you can't pay attention to EVERYTHING.. which is what is required to find the omission.
Here are two verses of a poem (song) in Mandarin Chinese:
yi quan ting ni de
er gei ni hao de
shu dao san yong yuan ai ni yi ge
si bu hui fan cuo
wu bu hui luo suo
shuo ni xiang shuo de
zuo ni xiang zuo de
bie pa shi bai yin wei ni you wo
pei ni kan ri luo
pei ni yi qi chang wan wo men ai de ge
I removed two lines. Where did that happen?
Would your answer be different if I told you that I might or might not have removed some lines?
> …
> I removed two lines. Where did that happen?
If you read the paper you will see they provide the original as well as the version missing information.
I did mention this in my comment too.
I am quite sure I could find your two missing lines if you provide me the full poem.
Given that you are a prolific commenter on HN, I am sure a LLM could be fine tuned to detect missing text from your comments without additional information. For example …
> WinForms is still around. There have been further tec lly just a big tire fire and about the best you can do is to ignore all of them and develop in WinForms.
It’s probably possible to detect missing information from “tec” until “lly”. But to know what is between is not possible for a human either, beyond plausible guesses.
The fact that the original was provided doesn't demonstrate that it's necessary to the task. You can identify missing text without needing to know what was there.
> Given that you are a prolific commenter on HN, I am sure a LLM could be fine tuned to detect missing text from your comments without additional information.
Same thing. Why would you need to do tuning on text authored by me? You can easily detect missing text of that style by the fact that the sentence you have fails to be English. You can do the same thing in text for which you have no prior experience of the author.
> I am quite sure I could find your two missing lines if you provide me the full poem.
But hey, if you insist:
轻轻贴近你的耳朵
사랑해요
情话永远不嫌太多对你说
一全听你的
二给你好的
数到三永远爱你一个
四不会犯错
五不会啰嗦
每天为你打 call, cook 也不错
轻轻贴近你的耳朵
사랑해요
情话永远不嫌太多对你说
打开你的爱情手册
就在此刻
为你唱的专属情歌要记得
说你想说的
做你想做的
别怕失败因为你有我
陪你看日落
陪你等雨过
陪你一起唱完我们爱的歌
轻轻贴近你的耳朵
사랑해요
情话永远不嫌太多对你说
打开你的爱情手册
就在此刻
为你唱的专属情歌要记得
我轻轻靠近你的耳朵
说爱你不嫌太多
如果相遇的几率亿万分之一那么多
请相信我的真真真心比宇宙还辽阔
我会牵着你的手知道你全部接受
打开你的爱情手册
就在此刻
这首专属情歌 请记得
I’d therefore conjecture that lines are missing after ‘ge’ and ‘ge’.
This of course assumes Chinese poetry is based on vowels matching as e.g it is the case in german and not based on rhythm as would be the case in Latin and Arabic.
For needle in a haystack you have to pay attention to the thing that you are trying to find. Attention can do this pretty well.
When looking for an omission, that omission can be anything, you can only reason about it by comparing one whole context to another whole context. The attention layers can't really do that.
This is similar to the "rank a long set of things" problem. Absent some meta cognition process, they just can't do that.