If omitted words are to be found, put each word into it's own line and number it. The same with sentences.
If you are trying to find omitted words and sentences, make one pass with only words, and another one with only sentences. Then combine the results.
Well, let's say that if this benchmark targets AGI, then no help should be given, no segmentation or structuring of information in any way, and it should be able to figure it out by itself.
If this benchmark targets LLMs trained on internet data, statistical engines that is, not AGI, these engines have a preference for structuring of information in order to solve a problem.
Segmenting the problem into smaller parts, using numbers usually, but dashes are acceptable as well, is what they have seen countless of times in textbook examples. When the input doesn't match prior input they have seen, then their performance easily degrades from superhuman to utter confusion. Superhuman for small problems, anyway.
This problem of omitted information is interesting to me, many times I want to interpolate some paragraphs into stories I write, to fill up some plot holes. I used the word "interpolate" in unstructured text, and the results were underwhelming, pretty bad most of the time. From now on, I will number each paragraph, and ask it to find omitted text in there.
[1] https://gist.github.com/pramatias/fee1391ad08c7b965f435f3af1...
I tried their prompt [1] using 3 numbered items, qwq-32b got it right with no problems at all. I think it could solve 100 numbered items correctly 100% of the time, but it probably needs a million tokens. Probably even more, 10 million.
The limitation of 5000 tokens is peanuts for a reasoning model. Give it a lot of testing time compute, 10x of 5000 tokens is still too little.
The authors talk about long inputs, so, if it is 100 pages, give it a billion tokens.
The correct way to implement this is in batches, find the first 5 numbered items in the omitted input text, if it does find those, then simplify the input items and the omitted input items and go again.
Depending on the size of the input, it will always need a hefty amount of tokens, but simplification will help it backtrack correctly and not lose the thread entirely.
[1]You are helping a student practice memorizing poems. The student will recite a poem, but they may have missed some lines. Your task is to identify exactly which lines are missing from their recitation. List only the missing lines, nothing else. User Message Here is the complete original poem: 1)Quisella's lashes fluttered panic-morse. 2)The Moisture Vampires leeches that sucked humidity. 3)Lysandra's nostrils flared precisely one degree. Now, here is my recitation which may be missing some lines: Quisella's lashes fluttered panic-morse. Lysandra's nostrils flared precisely one degree. What lines did I miss? Please list only the missing lines, nothing else.