Preferences

blindriver parent
Humans read books. AI/LLMs do not read. I think there's an inherent difference here. If the LLM is making a copy of the entire book in it's memory, is that copyright infringement? I don't know the answer to that, but it feels like Alsup is considering this fair use argument in the context of a human, but it's nothing like a human and needs to be treated differently.

steveklabnik
LLMs do not "make a copy of the entire book in its memory" so that specific question is kind of moot.
Its already established it can recite whole Hairy Potter and Carmacks Fast Inverse word for word. Just because it uses fancy compression doesnt mean its not a copy.
riskable
It can recite something like 80% of Harry Potter with carefully crafted prompts. If you take half a sentence from Harry Potter then tell the LLM to predict the rest it will complete it. That's what they did in that study you're referring to.

It's not even remotely the same thing as "can recite whole Harry Potter." If you ask an LLM to regurgitate Harry Potter it won't be able to do so because that's not how they work. They're prediction engines and it just so happens that Harry Potter quotes/excerpts are so pervasive on the Internet that the LLMs ingress ranks that style of wording higher than other styles.

Ask it to regurgitate some other, less-popular work. Do it for hundreds or thousands of them. You'll quickly find that those two examples you gave are the exceptions and that LLMs can't pull it off. They won't even get close.

This item has no comments currently.