Preferences

AI writes most of the code for most new YC companies, as of this year.

I think this is is less significant b/c

1. Most of these companies are AI companies & would want to say that to promote whatever tool they're building

2. Selection b/c YC is looking to fund companies embracing AI

3. Building a greenfield project with AI to the quality of what you need to be a YC-backed company isn't particularly "world-class"

They’re not lying when they say they have AI write their code, so it’s not just promotion. They will thrive or die from this thesis. If present YC portfolio companies underperform the market in 5-10 years, that’s a strong signal for AI skeptics. If they overperform, that’s a strong signal that AI skeptics were wrong.

3. You are absolutely right. New startups have greenfield projects that are in-distribution for AI. This gives them faster iteration speed. This means new companies have a structural advantage over older companies, and I expect them to grow faster than tech startups that don’t do this.

Plenty of legacy codebases will stick around, for the same reasons they always do: once you’ve solved a problem, the worst thing you can do is rewrite your solution to a new architecture with a better devex. My prediction: if you want to keep the code writing and office culture of the 2010s, get a job internally at cloud computing companies (AWS, GCP, etc). High reliability systems have less to gain from iteration speed. That’s why airlines and banks maintain their mainframes.

How do you know they’re not lying?
So they don't own the copyright to most of their code? What's the value then?
They do. Where did you get this? All the providers have clauses like this:

"4.1. Generally. Customer and Customer’s End Users may provide Input and receive Output. As between Customer and OpenAI, to the extent permitted by applicable law, Customer: (a) retains all ownership rights in Input; and (b) owns all Output. OpenAI hereby assigns to Customer all OpenAI’s right, title, and interest, if any, in and to Output."

https://openai.com/policies/services-agreement/

The outputs of AI are most likely in the public domain. As automated process output are public domain, and the companies claim fair use when scraping, making the input unencumbered, too.

It wouldn't be OpenAI holding copyright - it would be no one holding copyright.

Courts have already leaned this way too, but who knows what'll happen when companies with large legal funds enter the arena.
So you're saying machine code is public domain if it's compiled from C? If not, why would AI generated code be any different?
That would be considered a derivative work of the C code, therefore copyright protected, I believe.

Can you replay all of your prompts exactly the way you wrote them and get the same behaviour out of the LLM generated code? In that case, the situation might be similar. If you're prodding an LLM to give you a variety of resu

But significantly editing LLM generated code _should_ make it your copyright again, I believe. Hard to say when this hasn't really been tested in the courts yet, to my knowledge.

The most interesting question, to me, is who cares? If we reach a point where highly valuable software is largely vibe coded, what do I get out of a lack of copyright protection? I could likely write down the behaviour of the system and generate a fairly similar one. And how would I even be able to tell, without insider knowledge, what percentage of a code base is generated?

There are some interesting abuses of copyright law that would become more vulnerable. I was once involved in a case where the court decided that hiding a website's "disable your ad blocker or leave" popup was actually a case of "circumventing effective copyright protection". In this day and age, they might have had to produce proof that it was, indeed, copyright protected.

Derivatives inherit.

Public domain in, public domain out.

Copyright'd in, copyright out. Your compiled code is subject to your copyright.

You need "significant" changes to PD to make it yours again. Because LLMs are predicated on massive public data use, they require the output to PD. Otherwise you'd be violating the copyright of the learning data - hundreds of thousands of individuals.

Monkey Selfie case, setting the stage for an automated process is not enough to declare copyright over a work.
No, and your comment is ridiculously bad faith. Courts ruled that outputs of LLMs are not copyrightable. They did not rule that outputs of compilers are not copyrightable.
What about patents - if you didn't use cleanroom then you have no defence?

Patent trolls will extort you: the trolls will be using AI models to find "infringing" software, and then they'll strike.

¡There's no way AI can be cleanroom!

That explains the low quality of all launch HN this year
Stats/figures to backup the low quality claim?
If you have them, post them.
YC companies have pretty much always been overhyped trivial bullshit. I'm not surprised it's even worse nowadays, but it's never been more than a dog and pony show for bullshit.

This item has no comments currently.