Preferences

But those training the LLMs are still using the works, and not just to discuss them, which I think is the point of fair use doctrine. I guess I fail to see how it's any different from me using it in some other way? If I wanted to write a play very loosely inspired by Blood Meridian, it might be transformative, but that doesn't justify me pirating the book.

I tend to think copyright should be extremely limited compared to what it is now, but to me the logic of this ruling is illogical other than "it's ok for a corporation to use lots of works without permission but not for an individual to use a single work without permission." Maybe if they suddenly loosened copyright enforcement for everyone I might feel differently.

"Kill one man, and you are a murderer. Kill millions of men, and you are a conqueror." (An admittedly hyperbolic comparison, but similar idea.)


rcxdude
>If I wanted to write a play very loosely inspired by Blood Meridian, it might be transformative, but that doesn't justify me pirating the book.

I think that's the conclusion of the judge. If Anthropic were to buy the books and train on them, without extra permission from the authors, it would be fair use, much like if you were to be inspired by it (though in that case, it may not even count as a derivative work at all, if the relationship is sufficiently loose). But that doesn't mean they are free to pirate it either, so they are likely to be liable for that (exactly how that interpretation works with copyright law I'm not entirely sure: I know in some places that downloading stuff is less of a problem than distributing it to others because the latter is the main thing that copyright is concerned with. And AFAIK most companies doing large model training are maintaining that fair use also extends to them gathering the data in the first place).

(Fair use isn't just for discussion. It covers a broad range of potential use cases, and they're not enumerated precisely in copyright law AFAIK, there's a complicated range of case law that forms the guidelines for it)

tsumnia
I think the issue is that its actually quite difficult to "unlearn" something once you've seen it. I'm speaking more from human-learning rather than AI-learning, but since AI is inspired by our view on nature, it will have similar qualities. If I see something that inspires, regardless of if I paid for that, I may not even know what specifically inspired me. If I sit on a park bench and an idea comes to me, it could come from a number of things - the bench, park, weather, what movie I watched last night, stuff on the wall of a restaurant while I was eating there, etc.

While humans don't have encyclopedic memories, our brain connects a few dots to make a thought. If I say "Luke, I am your father", it doesn't matter that isn't even the line is wrong, anyone that's seen Star Wars knows what I'm quoting. I may not be profiting from using that line, but that doesn't stop Star Wars from inspiring other elements of my life.

I do agree that copyright law is complicated and AI is going to create even more complexity as we navigate this growth. I don't have a solution on that front, just a recognition that AI is doing what humans do, only more precisely.

altruios
which AFAIN IANAL, copyright and exhaustive rights are completely different. Under copyright, once a book is purchased: that's it. Reselling the same, or transformed (re: highlighted) worked 'used' is 100% legal, as is consuming it at your discretion (in your mind {a billion times}, a fire, or (yes even) what amounts to a fancy calculator).

(that's all to say copyright is dated and needs an overhaul)

But that's taking a viewpoint of 'training a personal AI in your home', which isn't something that actually happens... The issue has never been the training data itself. Training an AI and 'looking at data and optimizing a (human understanding/AI understanding) function over it' are categorically the same, even if mechanically/biologically they are very different.

dragonwriter
> I tend to think copyright should be extremely limited compared to what it is now, but to me the logic of this ruling is illogical other than "it's ok for a corporation to use lots of works without permission but not for an individual to use a single work without permission."

That's not what the ruling says.

It says that training a generative AI system not designed primarily as a direct replacement for a work on one or more works is fair use, and that print-to-digital destructive scanning for storage and searchability is fair use.

These are both independent of whether one person or a giant company or something in between is doing it, and independent of the number of works involved (there's maybe a weak practical relationship to the number of works involved, since a gen AI tool that is trained on exactly one work is probably somewhat less likely to have a real use beyond a replacement for that work.)

fallingknife
But if you did pirate the book, and let's say it cost $50, and then you used it to write a play based on that book and made $1 million selling that, only the $50 loss to the publisher would be relevant to the lawsuit. The fact that you wrote a non-infringing play based on it and made $1 million would be irrelevant to the case. The publisher would have no claim to it.
The judge actually agreed with your first paragraph:

> This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no decision holding or requiring that pirating a book that could have been bought at a bookstore was reasonably necessary to writing a book review, conducting research on facts in the book, or creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.

(But the judge continued that "this order need not decide this case on that rule": instead he made a more targeted ruling that Anthropic's specific conduct with respect to pirated copies wasn't fair use.)

tantalor
The analogy to training is not writing a play based on the work. It's more like reading (experiencing) the work and forming memories in your brain, which you can access later.

I'm allowed to hear a copyrighted tune, and even whistle it later for my own enjoyment, but I can't perform it for others without license.

AlienRobot
This is nonsense, in my opinion. You aren't "hearing" anything. You are literally creating a work, in this case, the model, derived from another work.

People need to stop anthropomorphizing neural networks. It's a software and a software is a tool and a tool is used by a human.

adinisom
Humans are also created/derived from other works, trained, and used as a tool by humans.

It's interesting how polarizing the comparison of human and machine learning can be.

tantalor
It is easy to dismiss, but the burden of proof would be on the plaintiff to prove that training a model is substantially different than the human mind. Good luck with that.
AlienRobot
That makes no sense as a default assumption. It's like saying FSD is like a human driver. If it's a person, why doesn't it represent itself in court? What wages is it being paid? What are the labor rights of AI? How is it that the AI is only human-like when it's legally convenient?

What makes far more sense is saying that someone, a human being, took copyrighted data and fed it into a program that produces variations of the data it was fed. This is no different from a photoshop filter, and nobody would ever need to argue in court that a photoshop filter is not a human being.

protocolture
If I buy a book, and use it to prop up the table on which I build a door, I dont owe the author any additional money over what I paid for it.

If I buy a book, and as long as the product the book teaches me to build isnt a competing book, the original author should have no avenue for complaint.

People are really getting hung up on the computer reading the data and computing other data with it. It shouldnt even need to get to fair use. Its so obviously none of the authors business well before fair use.

klabb3
> But those training the LLMs are still using the works, and not just to discuss them, which I think is the point of fair use doctrine.

Worse, they’re using it for massive commercial gain, without paying a dime upstream to the supply chain that made it possible. If there is any purpose of copyright at all, it’s to prevent making money from someone’s else’s intellectual work. The entire thing is based on economic pragmatism, because just copying does obviously not deprive the creator of the work itself, so the only justification in the first place is to protect those who seek to sell immaterial goods, by allowing them to decide how it can be used.

Coming to the conclusion that you can ”fair use” yourself out of paying for the most critical part of your supply makes me upset for the victims of the biggest heist of the century. But in the long term it can have devastating chilling effects, where information silos will become the norm, and various forms of DRM will be even more draconian.

Plus, fair use bypasses any licensing, no? Meaning even if today you clearly specify in the license that your work cannot be used in training commercial AI, it isn’t legally enforceable?

growse
> Worse, they’re using it for massive commercial gain, without paying a dime upstream to the supply chain that made it possible. If there is any purpose of copyright at all, it’s to prevent making money from someone’s else’s intellectual work.

This makes no sense. If I buy and read a book on software engineering, and then use that knowledge to start a career, do I owe the author a percentage of my lifetime earnings?

Of course not. And yet I've made money with the help of someone else's intellectual work.

Copyright is actually pretty narrowly defined for _very good reason_.

klabb3
> If I buy and read a book on software engineering

You're comparing that you as an individual purchase one copy of a book to a multi-billion dollar company systematically ingesting them for profit without any compensation, let alone proportional?

> do I owe the author a percentage of my lifetime earnings?

No, but you are a human being. You have a completely different set of rights from a corporation, or a machine. For very good reason.

growse
Does copyright law apply differently to humans Vs organisations?

> without any compensation,

Didn't Anthropic buy the books?

lurkshark
If you pirate a book on software engineering and then use that knowledge to start a career, do you owe the author the royalties they would be paid had you bought the book?

If the career you start isn't software engineering directly but instead re-teaching the information you learned from that book to millions of paying students, is the regular royalty payment for the book still fair?

This item has no comments currently.