I tend to think copyright should be extremely limited compared to what it is now, but to me the logic of this ruling is illogical other than "it's ok for a corporation to use lots of works without permission but not for an individual to use a single work without permission." Maybe if they suddenly loosened copyright enforcement for everyone I might feel differently.
"Kill one man, and you are a murderer. Kill millions of men, and you are a conqueror." (An admittedly hyperbolic comparison, but similar idea.)
I think that's the conclusion of the judge. If Anthropic were to buy the books and train on them, without extra permission from the authors, it would be fair use, much like if you were to be inspired by it (though in that case, it may not even count as a derivative work at all, if the relationship is sufficiently loose). But that doesn't mean they are free to pirate it either, so they are likely to be liable for that (exactly how that interpretation works with copyright law I'm not entirely sure: I know in some places that downloading stuff is less of a problem than distributing it to others because the latter is the main thing that copyright is concerned with. And AFAIK most companies doing large model training are maintaining that fair use also extends to them gathering the data in the first place).
(Fair use isn't just for discussion. It covers a broad range of potential use cases, and they're not enumerated precisely in copyright law AFAIK, there's a complicated range of case law that forms the guidelines for it)
While humans don't have encyclopedic memories, our brain connects a few dots to make a thought. If I say "Luke, I am your father", it doesn't matter that isn't even the line is wrong, anyone that's seen Star Wars knows what I'm quoting. I may not be profiting from using that line, but that doesn't stop Star Wars from inspiring other elements of my life.
I do agree that copyright law is complicated and AI is going to create even more complexity as we navigate this growth. I don't have a solution on that front, just a recognition that AI is doing what humans do, only more precisely.
(that's all to say copyright is dated and needs an overhaul)
But that's taking a viewpoint of 'training a personal AI in your home', which isn't something that actually happens... The issue has never been the training data itself. Training an AI and 'looking at data and optimizing a (human understanding/AI understanding) function over it' are categorically the same, even if mechanically/biologically they are very different.
That's not what the ruling says.
It says that training a generative AI system not designed primarily as a direct replacement for a work on one or more works is fair use, and that print-to-digital destructive scanning for storage and searchability is fair use.
These are both independent of whether one person or a giant company or something in between is doing it, and independent of the number of works involved (there's maybe a weak practical relationship to the number of works involved, since a gen AI tool that is trained on exactly one work is probably somewhat less likely to have a real use beyond a replacement for that work.)
> This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no decision holding or requiring that pirating a book that could have been bought at a bookstore was reasonably necessary to writing a book review, conducting research on facts in the book, or creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.
(But the judge continued that "this order need not decide this case on that rule": instead he made a more targeted ruling that Anthropic's specific conduct with respect to pirated copies wasn't fair use.)
I'm allowed to hear a copyrighted tune, and even whistle it later for my own enjoyment, but I can't perform it for others without license.
People need to stop anthropomorphizing neural networks. It's a software and a software is a tool and a tool is used by a human.
It's interesting how polarizing the comparison of human and machine learning can be.
What makes far more sense is saying that someone, a human being, took copyrighted data and fed it into a program that produces variations of the data it was fed. This is no different from a photoshop filter, and nobody would ever need to argue in court that a photoshop filter is not a human being.
Worse, they’re using it for massive commercial gain, without paying a dime upstream to the supply chain that made it possible. If there is any purpose of copyright at all, it’s to prevent making money from someone’s else’s intellectual work. The entire thing is based on economic pragmatism, because just copying does obviously not deprive the creator of the work itself, so the only justification in the first place is to protect those who seek to sell immaterial goods, by allowing them to decide how it can be used.
Coming to the conclusion that you can ”fair use” yourself out of paying for the most critical part of your supply makes me upset for the victims of the biggest heist of the century. But in the long term it can have devastating chilling effects, where information silos will become the norm, and various forms of DRM will be even more draconian.
Plus, fair use bypasses any licensing, no? Meaning even if today you clearly specify in the license that your work cannot be used in training commercial AI, it isn’t legally enforceable?
This makes no sense. If I buy and read a book on software engineering, and then use that knowledge to start a career, do I owe the author a percentage of my lifetime earnings?
Of course not. And yet I've made money with the help of someone else's intellectual work.
Copyright is actually pretty narrowly defined for _very good reason_.
You're comparing that you as an individual purchase one copy of a book to a multi-billion dollar company systematically ingesting them for profit without any compensation, let alone proportional?
> do I owe the author a percentage of my lifetime earnings?
No, but you are a human being. You have a completely different set of rights from a corporation, or a machine. For very good reason.
If the career you start isn't software engineering directly but instead re-teaching the information you learned from that book to millions of paying students, is the regular royalty payment for the book still fair?
Personally I like to frame most AI problems by substituting a human (or humans) for the AI. Works pretty well most of the time.
In this case if you hired a bunch of artists/writers that somehow had never seen a Disney movie and to train them to make crappy Disney clones you made them watch all the movies it certainly would be legal to do so but only if they had legit copies in the training room. Pirating the movies would be illegal.
Though the downside is it does create a training moat. If you want to create the super-brain AI that's conversant on the corpus of copyrighted human literature you're going to need a training library worth millions
Human time is inherently valuable, computer time is not.
The issue with LLMs is that they allow doing things at a massive scale which would previously be prohibitively time consuming. (You could argue but them how much electricity is worth one human life?)
If I "write" a book by taking another and replacing every word with a synonym, that's obviously plagiarism and obviously copyright infringement. How about also changing the word order? How about rewording individual paragraphs while keeping the general structure? It's all still derivative work but as you make it less detectable, the time and effort required is growing to become uneconomical. An LLM can do it cheaply. It can mix and match parts of many works but it's all still a derivative of those works combined. After all, if it wasn't, it would produce equally good output with a tiny fraction of the training data.
The outcome is that a small group of people (those making LLMs and selling access to their output) get to make huge amounts of money off of the work of a group that is several orders of magnitude larger (essentially everyone who has written something on the internet) without compensating the larger group.
That is fundamentally exploitative, whether the current laws accounted for that situation or not.
I see elements of that here. Buying copyrighted works not to be exposed and be inspired, nor to utilize the aithor's talents, but to fuel a commercialization of sound-a-likes.
Keep in mind, the Authors in the lawsuit are not claiming the _output_ is copyright infringement so Alsup isn't deciding that.
You're referencing Midler v Ford Motor Co in the 9th circuit. This case largely applies to California, not the whole nation. Even then, it would take one Supreme Court case to overturn it.
https://en.wikipedia.org/wiki/Mickey_Mouse#Walt_Disney_Produ...
I'm on the Air Pirates side for the case linked, by the way.
However, AI is not a parody. It's not adding to the cultural expression like a parody would.
Let's forget all the law stuff and these silly hypotheticals. Let's think of humanity instead:
Is AI contributing to education and/or culture _right now_, or is it trying to make money? I think they're trying to make money.
Says who?
> Is AI contributing to education and/or culture _right now_, or is it trying to make money?
How on earth are those things mutually exclusive? Also, whether or not it's being used to make money is completely irrelevant to whether or not it is copyright infringement.
Artists.
https://en.wikipedia.org/wiki/SAG-AFTRA
> How on earth are those things mutually exclusive?
Put those on a spectrum and rethink what I said.
> completely irrelevant to whether or not it is copyright infringement
_Again_, leave aside law minutiae and hypotheticals.
> Artists.
> https://en.wikipedia.org/wiki/SAG-AFTRA
Do you have a link that has their stance on how AI is harming culture? The best I could find is https://www.sagaftra.org/contracts-industry-resources/member...
I can't find anything in there or its linked articles about culture. I do find quite a bit about synthetic performers and digital replicas and making sure that people who do voice acting don't have their performance used to generate material that is done at a discounted rate and doesn't reimburse the performer.
https://www.sagaftra.org/ongoing-fight-ai-protections-makes-...
> Protective A.I. guardrails for actors who work in video games remain a point of contention in the Interactive Media Agreement negotiations which have been ongoing from October 2022 until last month’s strike. Other A.I.-related panels Crabtree-Ireland participated in included a U.S. Department of Justice and Stanford University co-hosted event about promoting competition in A.I., as well as a Vanderbilt University summit on music law and generative A.I. SAG-AFTRA Executive Vice President Linda Powell discussed the interactive negotiations and A.I.’s many implications for creatives during her keynote speech at an Art in the Age of A.I. symposium put on by Villa Albertine at the French Embassy.
> She said A.I. represents “a turning point in our culture,” adding, “I think it’s important that we be participants in it and not passengers in it ... We need to make our voices known to the handful of people who are building and profiting off of this brave new world.”
This doesn't indicate that its good or bad, but rather that they want to make sure that people are in control of it and people are compensated for the works that are created from their performance.
How many copies? They're not serving a single client.
Libraries need to have multiple e-book licenses, after all.
It changes the definition of what a "legal copy" is but the general idea that the copy must be legal still stands.
If you train a LLM on harry potter and ask it to generate a story that isn't harry potter then it's not a replacement.
However, if you train a model on stock imagery and use it to generate stock imagery then I think you'll run into an issue from the Warhol case.
I wouldn't call it that. Goldsmith took a photograph of Prince which Warhol used as a reference to generate an illustration. Vanity Fair then chose to buy a license Warhol's print instead of Goldsmith's photograph.
So, despite the artwork being visual transformative (silkscreen vs photograph) the actual use was not transformed.
So if I or an LLM simply doesn’t allow said extraction to occur, memorization and copying is not against the law.
This ruling doesn't say anything about the enforceability of a "don't train AI on this" contract, so even if the logic of this ruling became binding prcecednet (trial court rulings aren't), such clauses would be as valid after as they are today. But contracts only affect people who are parties to the contract.
Also, the damages calculations for breach of contract are different than for copyright infringement; infringement allows actual damages and infringer's profits (or statutory damages, if greater than the provable amount of the others), but breach of contract would usually be limited to actual damages ("disgorgement" is possible, but unlike with infringer's profits in copyright, requires showing special circumstances.)
Meta at least just downloaded ENGLISH_LANGUAGUE_BOOKS_ALL_MEGATORRENT.torrent and trained on that.
quote: “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,” Judge Alsup wrote in the decision. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages.”
This tells me Anthropic acquired these books legally afterwards. I was asking if during that purchase, the seller could add a no training close to the sales contract.
https://en.wikipedia.org/wiki/First-sale_doctrine
> The doctrine was first recognized by the Supreme Court of the United States in 1908 (see Bobbs-Merrill Co. v. Straus) and subsequently codified in the Copyright Act of 1909. In the Bobbs-Merrill case, the publisher, Bobbs-Merrill, had inserted a notice in its books that any retail sale at a price under $1.00 would constitute an infringement of its copyright. The defendants, who owned Macy's department store, disregarded the notice and sold the books at a lower price without Bobbs-Merrill's consent. The Supreme Court held that the exclusive statutory right to "vend" applied only to the first sale of the copyrighted work.
> Today, this rule of law is codified in 17 U.S.C. § 109(a), which provides:
> Notwithstanding the provisions of section 106 (3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.
---
If I buy a copy of a book, you can't limit what I can do with the book beyond what copyright restricts me.
LLMs may sometimes reproduce exact copies of chunks of text, but I would say it also matters that this is an irrelevant use case that is not the main value proposition that drives LLM company revenues, it's not the use case that's marketed and it's not the use case that people in real life use it for.
Humans, animals, hardware and software are treated differently by law because they have different constraints and capabilities.
Let's be real, Humans have special treatment (more special than animals as we can eat and slaughter animals but not other humans) because WE created the law to serve humans.
So in terms of being fair across the board LLMs are no different. But there's no harm in giving ourselves special treatment.
https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
Maybe there's another big Google Books lawsuit that Google ultimately lost, but I don't know which one you mean in that case.
They did not have to, they had an alternate means available (and used it for many of the books), buying physical copies and destructively scanning them.
> and they will not win on their right to commercialize the results of training
That seems an unwarranted conclusion, at best.
> so what good is the Fair Use ruling
If nothing else, assuming the logic of the ruling is followed by the inevitable appeals court decision and becomes binding precedent, it provides a clear road to legally training LLMs on books without copyright issues (combination of "training is fair use" and "destructive scanning for storage and searchability is fair use"), even if the pirating of a subset of the source material in this case were to make Anthropic's existing products prohibited (which I think you are wrong to think is the likely outcome.)
Even if LLMs were actual human-level AI (they are not - by far), a small bunch of rich people could use them to make enormous amounts of money without putting in the enormous amounts of work humans would have to.
All the while "training" (= precomputing transformations which among other things make plagiarism detection difficult) on work which took enormous amounts of human labor without compensating those workers.
This is OK and fair use: Training LLMs on copyrighted work, since it's transformative.
This is not OK and not fair use: pirating data, or creating a big repository of pirated data that isn't necessarily for AI training.
Overall seems like a pretty reasonable ruling?