> We have evidence of LLMs reproducing code from github that was never ever released with a license that would permit their use. We know this is illegal.
What is illegal about it? You are allowed to read and learn from publicly available unlicensed code. If you use that learning to produce a copy of those works, that is enfringement.
Meta clearly enganged in copyright enfringement when they torrented books that they hadn't purchased. That is enfringement already before they started training on the data. That doesn't make the training itself enfringement though.
What kind of bullshit argument is this? Really? Works created using illegally obtained copyrighted material are themselves considered to be infringing as well. It's called derivative infringment. This is both common sense and law. Even if not, you agree that they infringed on copyright of something close to all copyrighted works on the internet and this sounds fine to you? The consequences and fines from that would kill any company if they actually had to face them.
That isn't true.
The copyright to derivative works is owned by the copyright holder of the original work. However using illegaly obtained copies to create a fair use transformative work does not taint your copyright of that work.
> Even if not, you agree that they infringed on copyright of something close to all copyrighted works on the internet and this sounds fine to you?
I agree that they violated copyright when they torrented books and scholarly arguments. I don't think that counts at "close to all copyrighted works on the Internet".
> The consequences and fines from that would kill any company if they actually had to face them.
I don't actually agree that copyright that causes no harm should be met with such steep penalties. I didn't agree when it was being done by the RIAA and even though I don't like facebook, I don't like it here either.
>It's a CRYSTAL CLEAR violation of the law
in the court of reddit's public opinion, perhaps.
there is, as far as I can tell, no definite ruling about whether training is a copyright violation.
and even if there was, US law is not global law. China, notably, doesn't give a flying fuck. kill American AI companies and you will hand the market over to China. that is why "everyone just shrugs it off".
what do you picture happening if Western AI companies cease to operate tomorrow and fire all their researchers and engineers?
The idea that they are coming up with all this stuff from scratch is Public Relations bs. Like Arnold Schwarzenegger never taking steroids, only believable if you know nothing about body building.
If a person "trains" on other creatives' works, they can produce output at the rate of one person. This presents a natural ceiling for the potential impact on those creatives' works, both regarding the amount of competing works, and the number of creatives whose works are impacted (since one person can't "train" on the output of all creatives).
That's not the case with AI models. They can be infinitely replicated AND train on the output of all creatives. A comparable situation isn't one human learning from another human, it's millions of humans learning from every human. Only those humans don't even have to get paid, all their payment is funneled upwards.
It's not one artist vs. another artist, it's one artist against an army of infinitely replicable artists.
What is the basis that an LLM should be included as a "creative type"?
LLMs seem to match.
To go into details though, under copyright law there's a clause for "fair use" under a "transformative" criteria. This allows things like satire, reaction videos to exist. So long as you don't replicate 1-to-1 in product and purpose IMO it's qualifies as tasteful use.
If an LLM reads a free wikipedia article on Aladdin and adds a genie to it's story, what copyright law do you think has been broken?