Preferences

[flagged]

It's not free. There is a license attached. One you are supposed to follow and not doing so is against the law.
[flagged]
I'm not whining in this case, just pointing out "they gave it out for free" is completely false, at the very least for the GNU types. It was always meant to come with plenty of strings attached, and when those strings were dodged new strings were added (GPL3, AGPL).

If I had a photographic memory and I used it to replicate parts of GPLed software verbatim while erasing the license, I could not excuse it in court that I simply "learned from" the examples.

Some companies outright bar their employees from reading GPLed code because they see it as too high of a liability. But if a computer does it, then suddenly it is a-ok. Apparently according to the courts too.

If you're going to allow copyright laundering, at least allow it for both humans and computers. It's only fair.

> If I had a photographic memory and I used it to replicate parts of GPLed software verbatim while erasing the license, I could not excuse it in court that I simply "learned from" the examples.

Right, because you would have done more than learning, you would have then gone past learning and used that learning to reproduce the work.

It works exactly the same for a LLM. Training the model on content you have legal access to is fine. Aftwards, somone using that model to produce a replica of that content is engaged in copyright enfringement.

You seem set on conflating the act of learning with the act of reproduction. You are allowed to learn from copyrighted works you have legal access to, you just aren't allowed to duplicate those works.

The problem is that it's not the user of the LLM doing the reproduction, the LLM provider is. The tokens the LLM is spitting out are coming from the LLM provider. It is the provider that is reproducing the code.

If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them.

We spread free software for multiple purposes, one of them being the free software ethos. People using that for training proprietary models is antithetical to such ideas.

It's also an interesting double standard, wherein if I were to steal OpenAI's models, no AI worshippers would have any issue condemning my action, but when a large company clearly violates the license terms of free software, you give them a pass.

> I were to steal OpenAI's models, no AI worshippers would have any issue condemning my action

If GPT-5 were "open sourced", I don't think the vast majority of AI users would seriously object.

OpenAI got really pissy about DeepSeek using other LLMs to train though.

Which is funny since that's a much clearer case of "learning from" than outright compressing all open source code into a giant pile of weights by learning a low-dimensional probability distribution of token sequences.

I can't speak for anyone else, but if you were to leak weights for OpenAI's frontier models, I'd offer to hug you and donate money to you.

Information wants to be free.

> The difference is that people who write open source code or release art publicly on the internet from their comfortable air conditioned offices voluntarily chose to give away their work for free

That is not nearly the extent of AI training data (e.g. OpenAI training its image models on Studio Ghibli art). But if by "gave their work away for free" you mean "allowed others to make [proprietary] derivative works", then that is in many cases simply not true (e.g. GPL software, or artists who publish work protected by copyright).

What? Over 183K books were pirated by these big tech companies to train their models. They knew what they were doing was wrong.
Perhaps you should Google the definition of metaphor before commenting.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal