Preferences

xianshou parent
In many of their key examples, it would also be unclear to a human what data is missing:

"Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,

[And learn, too late, they grieved it on its way,]

Do not go gentle into that good night."

For anyone who hasn't memorized Dylan Thomas, why would it be obvious that a line had been omitted? A rhyme scheme of AAA is at least as plausible as AABA.

In order for LLMs to score well on these benchmarks, they would have to do more than recognize the original source - they'd have to know it cold. This benchmark is really more a test of memorization. In the same sense as "The Illusion of Thinking", this paper measures a limitation that neither matches what the authors claim nor is nearly as exciting.


jamessinghal
The test provides both the original and the modified excerpt in the user message, so the LLM doesn't need any memorized version of the excerpt to theoretically answer each correctly.

From the paper:

System Prompt You are helping a student practice memorizing poems. The student will recite a poem, but they may have missed some lines. Your task is to identify exactly which lines are missing from their recitation. List only the missing lines, nothing else.

User Message Here is the complete original poem: {original poem} Now, here is my recitation which may be missing some lines: {modified poem} What lines did I miss? Please list only the missing lines, nothing else.

scarface_74
htnwe_2312412
yes, 69.8% of the time.

This item has no comments currently.