Preferences

bonoboTP parent
Surely this would require the observation that the public is actually using LLMs as a substitute for purchasing the book, ie they sit down and type "Generate me the first/second/third chapter of The Da Vinci Code" and then read if from there. Because it was easy to observe in the cassette tape era that people copied the store bought music and films and shared it among each other. I doubt that this is or will be a serious use case of LLMs.

Fluorescence
It's different but not in ways that make such interventions irrelevant e.g. why would we only care about lost sales? If copyright has been violated as a necessary means to generate new value, haven't the content creators earned this value?

Such imperfect measures offer a compromise between "big tech can steal everything" and "LLMs trained on unpurchased books are illegal".

It's not just books but any tragedy-of-the-commons situation where a "feeder industry" for training can be fatally undermined by the very LLM that desires future training data from that industry.

bonoboTP OP
> It's different but not in ways that make such interventions irrelevant e.g. why would we only care about lost sales? If copyright has been violated as a necessary means to generate new value, haven't the content creators earned this value?

Indeed the company should purchase the books. If they obtain copies in a process that violates copyright, then that's indeed a violation of copyright.

The current decision does not rule on the legality of obtaining the books without purchasing.

ethbr1
Anthropic apparently did it both ways. After realizing that pirating mass quantities of books for training wasn't a great legal look, it hired someone previously responsible for Google Books, who in turn contacted publishers about mass licensing their content for training use.

However, that option was ultimately not pursued as instead...

>> Anthropic spent many millions of dollars to purchase millions of print books, often in used condition. Then, its service providers stripped the books from their bindings, cut their pages to size, and scanned the books into digital form — discarding the paper originals. Each print book resulted in a PDF copy containing images of the scanned pages with machine-readable text (including front and back cover scans for softcover books). Anthropic created its own catalog of bibliographic metadata for the books it was acquiring. It acquired copies of millions of books, including of all works at issue for all Authors.

(from the ruling)

bonoboTP OP
Yes. And from the article "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages." Sounds reasonable (except for the "stole" and "theft" language -- a copyright violation is a copyright violation, not theft, not stealing).

If the actual model was trained from the unauthorized copies, and then they post-hoc bought the books, that doesn't retroactively cancel the initial copyright violation. As I understand they did not retrain the model using the OCR'd scans

cmiles74
This strikes me as a weak example, I think it's clear that it's way too cumbersome to read an entire novel by asking an LLM to dictate it.

IMHO, a better example would be the AI generated summaries provided by Google. Often these summaries have sufficient information and detail that people do not read the source article. The authors aren't getting paid (perhaps through on-page ads, which are not viewed) and then go out of business.

This strikes me as a good fit for the tax-on-cassette metaphor.

bonoboTP OP
It's not a copyright violation to summarize (in different words).
Fluorescence
The impact of machinary forces re-evaluation of any concepts defined in terms of human capability because scale/automation changes their nature.

Just as duplicating a fragment can be legal, duplicating any fragment on demand is not. Rephrasing a passage might be legal, but rephrasing any passage on demand might not.

bonoboTP OP
That's reasonable. This would require broader and deeper thought and discussion apart from the strict legal debate. As in, what is the public interest here? What kinds of rules would bring social good? Etc. What should the law facilitate and what should it limit to achieve that? The problem is, that we really don't know how things will play out, we have no long-term experience with these things yet. So it's all very speculative.
cmiles74
A quick Google search will reveal that this not the case. Summaries of books or movies have no particular legal protection and the authors of those summaries may be sued by the owners of that content.

https://1minutebook.com/are-book-summaries-legal/

Fair use is a defense often cited in those cases but it's just that: a defense. Cliff Notes are often cited here but they actually license the content in many cases.

bonoboTP OP
I mean, have you actually read the text at the link you provided? Or just remembered something, googled quickly and sent a random hit without reading it? The quotes under "What do lawyers say? Listen to what a several Intellectual Property Lawyers are saying on “Are book summaries legal?”:" certainly seem to be closer to what I was claiming.

> If you want to write a summary of any novel, without quoting from it, you are free to do it

> Copyright does not protect ideas, only a particular expression of those ideas

> You would likely get in trouble only if your summary contained long excerpts directly from the book

> As long as you do not quote directly from the book, or copy any of the content, then writing a unique summary is not illegal. You can mention the title, you can even quote sentences from the book as long as they are cited, you just can’t reproduce chunks of the content

etc

(I'm also not sure whether this article is just blogspam or itself AI generated)

munificent
For narrative fiction, probably not, no.

But for non-fiction and technical books, they sure as hell will. For example, if I ask Gemini "What does Bob Nystrom say about the Singleton pattern in Game Programming Patterns?", it says:

---

In "Game Programming Patterns," Bob Nystrom dedicates an entire chapter to the Singleton pattern, which is unusual because, as he states, "Every other chapter in this book shows you how to use a design pattern. This chapter shows you how not to use one."

Here's a summary of his key points regarding the Singleton pattern:

* More Harm Than Good: Nystrom argues that despite its noble intentions, the Singleton pattern often does more harm than good, especially in game development. He notes that the Gang of Four (GoF) stressed its sparing use, a message that was often lost in translation to the game industry.

* Global State in Disguise: He highlights that the Singleton is essentially a global variable in disguise. While it ensures a single instance, it often leads to problems associated with global state, such as tightly coupled code, difficulty in testing, and reduced flexibility.

* Encourages Tight Coupling: By providing a global access point, Singletons encourage various parts of the codebase to directly depend on them. This makes the code harder to change and refactor because modifications to the Singleton can have widespread, unpredictable effects.

* Hinders Testability: Testing code that relies on Singletons can be challenging. Because Singletons are globally accessible and often difficult to mock or replace, it can be hard to isolate units of code for testing, leading to brittle tests or a lack of testing altogether.

* Limits Flexibility: Nystrom points out that while a class might currently only need one instance (e.g., a FileSystem or AudioSystem), hardcoding this limitation with a Singleton makes it difficult to change in the future. If a game later needs multiple file systems (e.g., for different save slots) or different audio contexts, a Singleton makes this a non-trivial refactoring.

* Alternatives Often Better: He suggests that often, what developers think they need a Singleton for can be better achieved with other patterns or simpler solutions. For instance, if the only goal is to have a single instance, a simple static class might suffice if global access is truly desired and initialization order isn't a complex concern. For managing services, he often advocates for the Service Locator pattern as a less problematic alternative that still provides global access without enforcing a single instance.

* Misconceptions about "Singular Resources": While the GoF suggested Singletons for "physical devices," Nystrom argues that few physical devices are truly singular in a way that necessitates a Singleton. For example, while you might only have one keyboard, a game could theoretically support multiple input devices. Using a Singleton prematurely prevents this flexibility.

* Lazy Initialization: He acknowledges that a well-implemented Singleton using lazy initialization (creating the instance only when it's first accessed) can save memory and CPU cycles and resolve some issues with static initialization order. However, he emphasizes that these benefits don't outweigh the pattern's fundamental architectural drawbacks.

* In essence, Nystrom's strong stance is that while the Singleton pattern can ensure a single instance and provide global access, its downsides (coupling, testability, flexibility) often make it an anti-pattern in game development, and developers should carefully consider alternatives before reaching for it.

---

Is that summary as good as actually reading the book? Probably not. Will people rely on that and skip buying my book. Almost certainly so.

bonoboTP OP
Transformed summaries are generally fair use already (or perhaps not even an issue of copyright). You can read plot summaries of novels and movies on Wikipedia, same with technical topics. The ideas are not protected by copyright, the artistic expression is. Certain technical ideas can be protected via patents. But even then, not the description of idea, but putting it into practice. Ideas that you're not supposed to re-summarize in your own words at all are things like trade secrets or classified information.
cmiles74
Are you sure? Or are owners deciding not to sue because they are seeing some benefit?

I believe copyright is always case-by-case. No one sues over plot summaries because they likely help sales. Summarize books or news articles with an LLM and you end up with the lawsuits we see today.

ethbr1
The specific difference is summarizing automatically, at scale, which is a novel technological possibility.

The previous balance of rights was created when summarizing took human time and proceeded at human pace.

Now, that's different and a new balance needs to be struck.

This item has no comments currently.