Comment by findjashua

findjashua Dec 15, 2025 parent

providers' ToS explicitly states whether or not any data provided is used for training purposes. the usual that i've seen is that while they retain the right to use the data on free tiers, it's almost never the case for paid tiers

sotrusting Dec 15, 2025

Right, so totally cool to ignore the law but our TOS is a binding contract.

mc32 Dec 15, 2025

Yes, they can be sued for breach of contract. And it’s not a regular ToS but a signed MSA and other legally binding documents.

blibble Dec 15, 2025

the license on my open source code is a contract, and they ignored that

if they can get away with it (say by claiming it's "fair use"), they'll ignore corporate ones too

LPisGood Dec 15, 2025

If I were to go out on a limb, those companies spend more on tech companies than you and they have larger legal teams than you. That is a carrot and a stick for AI companies to follow the contract.

blibble Dec 15, 2025

no, it's not an incentive to follow the contract

it's an incentive to pretend as if you're following the contract, which is not the same thing

protocolture Dec 15, 2025

Where are they ignoring the law?

sotrusting Dec 15, 2025

https://www.reuters.com/business/environment/musks-xai-opera...

protocolture Dec 15, 2025

Thats an allegation. Doesnt an allegation need to be tested?

yieldcrv Dec 15, 2025

people that say this tend to have a misinterpretation of copyright, and use all the court cases brought by large rights holders as validation

despite all 3 branches of the government disagreeing with them over and over again

sotrusting Dec 15, 2025 (dead)

torginus Dec 15, 2025

I bet companies are circumventing this in a way that allows them to derive almost all the benefit from your data, yet makes it very hard to build a case against them.

For example, in RL, you have a train set, and a test set, which the model never sees, but is used to validate it - why not put proprietary data in the test set?

I'm pretty sure 99% of ML engineers would say this would constitute training on your data, but this is an argument you could drag out in courts forever.

Or alternatively - it's easier to ask for forgiveness than permission.

I've recently had an apocalyptic vision, that one day we'll wake up, an find that AI companies have produced an AI copy of every piece of software in existence - AI Windows, AI Office, AI Photoshop etc.

Oarch Dec 15, 2025

Given the conduct we've seen to date, I'd trust them to follow the letter - but not the spirit - of IP law.

There may very well be clever techniques that don't require directly training on the users' data. Perhaps generating a parallel paraphrased corpus as they serve user queries - one which they CAN train on legally.

The amount of value unlocked by stealing practically ~everyone's lunch makes me not want to put that past anyone who's capable of implementing such a technology.

bdangubic Dec 15, 2025

it is amazing in almost 2026 there is anyone believing this… amazing

GCUMstlyHarmls Dec 15, 2025

I wonder how much wiggle there is for collect now (to provide service, context history, etc), then later anonymise (some how, to some level) and then train on it?

Also I wonder if the ToS covers "queries & interaction" vs "uploaded data" - I could imagine some tricky language in there that says we wont use your word document, but we may at some time use the queries you put against it, not as raw corpus but as a second layer examining what tools/workflows to expand/exploit.

danielheath Dec 15, 2025

“We don’t train on your data” doesn’t exclude metadata, training on derived datasets via some anonymisation process, etc.

There’s a range of ways to lie by omission, here, and the major players have established a reputation for being willing to take an expansive view of their legal rights.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous