It wouldn't be OpenAI holding copyright - it would be no one holding copyright.
Can you replay all of your prompts exactly the way you wrote them and get the same behaviour out of the LLM generated code? In that case, the situation might be similar. If you're prodding an LLM to give you a variety of resu
But significantly editing LLM generated code _should_ make it your copyright again, I believe. Hard to say when this hasn't really been tested in the courts yet, to my knowledge.
The most interesting question, to me, is who cares? If we reach a point where highly valuable software is largely vibe coded, what do I get out of a lack of copyright protection? I could likely write down the behaviour of the system and generate a fairly similar one. And how would I even be able to tell, without insider knowledge, what percentage of a code base is generated?
There are some interesting abuses of copyright law that would become more vulnerable. I was once involved in a case where the court decided that hiding a website's "disable your ad blocker or leave" popup was actually a case of "circumventing effective copyright protection". In this day and age, they might have had to produce proof that it was, indeed, copyright protected.
That said, you don't necessarily always have 100% deterministic build when compiling code either.
So in my understanding (not as a lawyer, but someone who's had to deal with legal issues around software a lot), if you _save_ all the inputs that will lead to the LLM creating pretty much the same system with the same behaviour, you could probably argue that it's a derivative work of your input (which is creative work done by a human), and therefore copyright protected.
If you don't keep your input, it's harder to argue because you can't prove your authorship.
It probably comes down to the details. Is your prompt "make me some kind of blog", that's probably too trivial and unspecific to benefit from copyright protection. If you specify requirements to the degree where they resemble code in natural language (minus boilerplate), different story, I think.
(I meant to include more concrete logic in my post above, but it appears I'm not too good with the edit function, I garbled it :P)
Public domain in, public domain out.
Copyright'd in, copyright out. Your compiled code is subject to your copyright.
You need "significant" changes to PD to make it yours again. Because LLMs are predicated on massive public data use, they require the output to PD. Otherwise you'd be violating the copyright of the learning data - hundreds of thousands of individuals.
If the prompt makes the output a derivative, then the rest is also derivative.
Patent trolls will extort you: the trolls will be using AI models to find "infringing" software, and then they'll strike.
¡There's no way AI can be cleanroom!
"4.1. Generally. Customer and Customer’s End Users may provide Input and receive Output. As between Customer and OpenAI, to the extent permitted by applicable law, Customer: (a) retains all ownership rights in Input; and (b) owns all Output. OpenAI hereby assigns to Customer all OpenAI’s right, title, and interest, if any, in and to Output."
https://openai.com/policies/services-agreement/