Comment by KoolKat23 - Hacker Neue

KoolKat23 2 days ago parent

I struggle to see the incentive to do this, I have similar thoughts for locally run models. It's only use case I can imagine is small jobs at scale perhaps something like auto complete integrated into your deployed application, or for extreme privacy, honouring NDA's etc.

Otherwise, if it's a short prompt or answer, SOTA (state of the art) model will be cheap anyway and id it's a long prompt/answer, it's way more likely to be wrong and a lot more time/human cost is spent on "checking/debugging" any issue or hallucination, so again SOTA is better.

lukan 2 days ago

"or for extreme privacy"

Or for any privacy/IP protection at all? There is zero privacy, when using cloud based LLM models.

Workaccount2 2 days ago

Really only if you are paranoid. It's incredibly unlikely that the labs are lying about not training on your data for the API plans that offer it. Breaking trust with outright lies would be catastrophic to any lab right now. Enterprise demands privacy, and the labs will be happy to accommodate (for the extra cost, of course).

mistercheph 1 day ago

No, it's incredibly unlikely that they aren't training on user data. It's billions of dollars worth of high quality tokens and preference that the frontier labs have access to, you think they would give that up for their reputation in the eyes of the enterprise market? LMAO. Every single frontier model is trained on torrented books, music, and movies.

user34283 1 day ago

Considering that they will make a lot of money with enterprise, yes, that's exactly what I think.

What I don't think is that I can take seriously someone's opinion on enterprise service's privacy after they write "LMAO" in capslock in their post.

lukan 1 day ago

I just know many people here complained about the very unclear way, google for example communicates what they use for training data and what plan to choose to opt out of everything, or if you (as a normal buisness) even can opt out. Given the whole volatile nature of this thing, I can imagine an easy "oops, we messed up" from google if it turns out they were in fact using allmost everything for training.

Second thing to consider is the whole geopolitical situation. I know companies in europe are really reluctant to give US companies access to their internal data.

KoolKat23 OP 8 hours ago

To be fair, we all know googles terms are ambiguous as hell. It would not be a big surprise nor an outright lie if they did use it.

Its different if they proclaimed outright they won't use it and then do.

Not that any of this is right, it wouldn't be a true betrayal.

On a related note, these terms to me are a great example of success for EU GDPR regulations, and regulations on corporates in general. It's clear as day, additional protections are afforded to EU residents in these terms purely due to the law.

This item has no comments currently.