I can't wait actually. It's less about privacy to me than to being offline.
Non-technicals don't know how LLMs work, and, more importantly, don't care about their privacy.
For a technology to be widely used, by definition, you need to make it appealing to the masses, and there is almost zero demand for private LLM right now.
That's why I don't think that local llms will win. There are narrow use cases where regulations can force local llm usage (like for medical stuff), but overall I think that services will win (as they always do)
you need some really expensive hardware to run a local LLM, most of which is unavailable to the average user. The demand might just simply be hidden as these users do not know nor want to expend the resources for it.
but i have hope that the hardware costs will come down eventually, enough that it reveals the demand for local LLM.
After all, i prefer my private questions to an LLM not be ever revealed.
We can have services but also private history/contexts. Those can be "local" (and encrypted).
Like it or not, the judge's ruling sits comfortably within the framework of US law as it exists at present: since there's no reasonable expectation of privacy for chat logs sent to OpenAI, there's nothing to weigh against the competing interest of the active NYT case.
The 3rd party doctrine is worse than that - the data you gave is not only not yours anymore, it is not theirs either, but the governments. They're forced to act as a government informant, without any warrant requirements. They can say "we will do our very best to keep your data confidential", and contractually bind themselves to do so, but hilariously, in the Supreme Court's wise and knowledgeable legal view, this does not create an "expectation of privacy", despite whatever vaults and encryption and careful employee vetting and armed guards standing between your data and unauthorized parties.
Implying that the recourse is to change the law.
Those precedents are also fairly insane and not even consistent with one another. For example, the government needs a warrant to read your mail in the possession of the Post Office -- not only a third party but actually part of the government -- but not the digital equivalent of this when you transfer some of your documents via Google or Microsoft?
This case is also not the traditional third party doctrine case. Typically you would have e.g. your private project files on Github or something which Github is retaining for reasons independent of any court order and then the court orders them to provide them to the court. In this case the judge is ordering them to retain third party data they wouldn't have otherwise kept. It's not clear what the limiting principle there would be -- could they order Microsoft to retain any of the data on everyone's PC that isn't in the cloud, because their system updater gives them arbitrary code execution on every Windows machine? Could they order your home landlord to make copies of the files in your apartment without a warrant because they have a key to the door?
My understanding is it's closer to something like: They cannot order a company to create new tools, but can tell them to not destroy the data they already have. So, in the question of MS having the ability to create a tool that extracts your data is not the same as MS already having that tool functioning and collecting all of your data that they store and are then told to simply not destroy. Similarly, VPNs that are not set-up to create logs can't keep or hand over what they don't have.
Laws can be made to require the collection and storage of all user data by every online company, but we're not there -- yet. Many companies already do it on their own, and the user then decides if that's acceptable or not to continue using that service.
If the company created their service to not have the data in the first place, this probably never would have found its way to a judge. Their service would cost more, be slower, and probably be difficult to iterate on as it's easier to hack things together in a fast moving space then build privacy/security first solutions.
"the data they already have" means the data the user gave the company (no one is "giving" their files to their landlord) and that the company is in full possession of and now owns. Users in this case are not in possession or ownership of the data they gave away at this point.
If you hand out photocopies of the files in your apartment, the files in your apartment are still yours, but the copies you gave away to a bunch of companies are not. Those now belong to the company you gave them to and they can do whatever they want with it. So if they keep it and a judge tells them the documents are not to be destroyed (because laws things), they would probably get into trouble if they went against the order.
Which is what I was trying to bring attention to; the fact that the company has a choice in what data (if any) they decided to collect, possess, and own. If they never collected/stored it then no one's privacy would be threatened.
https://en.wikipedia.org/wiki/Third-party_doctrine#:~:text=w...
If OpenAI doesn't succeed at oral argument, then in theory they could try for an appeal either under the collateral order doctrine or seeking a writ of mandamus, but apparently these rarely succeed, especially in discovery disputes.
To prevent that you need Congress to tell them no, but that creates a sort of priority inversion: The machinery designed to stop the government from doing something bad unless there is consensus is then enabling government overreach unless there is consensus to stop it. It's kind of a design flaw. You want checks and balances to stop the government from doing bad things, not enable them.
> once you voluntarily give your data to a third party-- e.g. when you sent it to OpenAI-- it's not yours anymore and you have no reasonable expectation of privacy about it.
sorry for the layperson question, but does this apply then to my company's storage of confidential info on say google drive, even with an enterprise agreement?Furthermore, if the third party doctrine is upheld in its most naïve form, then this would breach the EU-US Data Privacy Framework. The US must ensure equivalent privacy protections to those under the GDPR in order for the agreement to be valid. The agreement also explicitly forbids transferring information to third parties without informing those whose information is transferred.
Users should stop sending information that shouldn't be public to US cloud giants like OpenAI.
The laws still look completely different in US and EU though. EU has stronger protections and directives on privacy and weaker supremacy of IP owners. I do not believe lawyers in any copyright case would get access to user data in a case like this. There is also a gap in the capabilities and prevalence of govt to force individual companies or even employees to insert and maintain secret backdoors with gag orders outside of court (though parts of the EU seem to be working hard to close that gap recently...).
[0]: Using it to derive baking recipes is not the same as using it to directly draft personal letters. Using it over VPN with pseudonym account info is not the same as using it from your home IP registered to your personal email with all your personals filled out and your credit card linked. Running a coding agent straight on your workstation is different to sandboxing it yourself to ensure it can only access what it needs.
Based on what? Keep in mind that the data is to be used for litigation purposes only and cannot be disclosed except to the extent necessary to address the dispute. It can't be given to third parties who aren't working on the issue.
> There is also a gap in the capabilities and prevalence of govt to force individual companies or even employees to insert and maintain secret backdoors with gag orders outside of court
There's no secret backdoor here. OpenAI isn't being asked to write new code--and in fact their zero-data-retention (ZDR) API hasn't changed to record data that it never recorded in the first place. They were simply ordered to disable deletion functionality in their main API, and they were not forbidden from disclosing that change to their customers.
Start using services of countries who are unlikely to submit data to the US.