There is clearly a deviation between the amount of oversight the author thinks they provided and the actual level of oversight. This is clear by the fact that they couldn’t even explain the misattribution. They also mention that this is not their area of expertise.
In general, I think that it is a reasonable assumption that, if you couldn’t have written the code yourself, you’re in no position to claim you can ensure its quality.
This manager is directly providing an unrelated team with an unprompted 150-file giant PR dumped at once with no previous discussion. Upon questioning, he says the code has been written by an outside contractor he personally chose.
No one has onboarded this contractor to the team, and checking their online presence shows lots of media appearances, but not a single production project in their CV, much less long time maintenance.
A cursory glance at the files reveals that the code contains copypasted code from stackoverflow to the point that the original author's name is still pasted in comments. The manager can not justify this, but doesn't seem bothered by the fact, and insists that the contractors are amazing because he's been following them in social networks and is infatuated with their media there.
Furthermore, you check the manager's history in slack and you see 15 threads of him doing the same for other teams. The ones that have agreed to review their PRs have closed them for being senseless.
How would you be interacting with this guy?
That could be either regular incompetence or a "broken brain." It's more towards latter if the manager had no clue about what was going on, even after having it explained to him.
This guy is equivalent to a manager who hired two bozos to do a job, but insisted it was good because he had them check each other's work and what they made didn't immediately fall down.
Either that highlights someone who is incompetent or they are willfully being blasé. Neither bodes well for contributing code while respecting copyright (though mixing and matching code on your own private repo that isn't distributed in source or binary form seems reasonable to me).
[1]: https://github.com/ocaml/ocaml/pull/14369#issuecomment-35573...
[2]: https://github.com/ocaml/ocaml/pull/14369#issuecomment-35566...
It's actually capable of reasoning and generating derivative code and not just copying stuff wholesale.
See examples at the bottom of my post:
I am routinely looking into the folly implementation, sometimes into the libstdc++, sometimes into libc++, sometimes into boost or abseil etc. to find inspiration for problems that I tackle in other codebases. By the same standards, this should also be plagiarism, no? I manufacture new ideas by compiling existing knowledge from elsewhere. Literally every engineer in the world does the same. Why is AI any different?
From a pragmatic viewpoint as an engineer you assign the IP you create over to the company you work for so plagarism has real world potential to lose you your job at best. There's a difference between taking inspiration from something unrelated "oh this is a neat algorithmic approach to solving this class of problems" to "I need to implement this specific feature and it exists in this library so I'll lift it nearly verbatim".
Do you mind showing me some examples of that? That seems so implausible to me
Just for reference, here's another example of AI adding phantom contributors and the human just ignoring it or not even noticing: https://github.com/auth0/nextjs-auth0/issues/2432
You may not consider this problematic. But maintainers of this project sure do, given this was one of the immediate concerns of theirs.
That is why some people are forbidden to contribute to projects if their eyes have read projects with incompatible licenses, in case people go to copyright court.
But even if that hadn't been the case, what exactly would be the problem? Are you saying that I cannot learn from a copyrighted book written by some respected and known author, and then apply that knowledge elsewhere because I would be risking to be sued for copyright infringement?
Which raises the question how many other important incorrect details are buried in the 13k lines of code that you are unaware of and unable to recognise the significance of? And how much mantainer time would you waste being dismissive of the issues?
People have taken the copyright header as indicative of wider problems in the code.
LGPL is a license for distribution, the copyright of the original authors is retained (unless signed away in a contribution agreement, usually to an organization).
"Are you saying that I cannot learn from a copyrighted book written by some respected and known author, and then apply that knowledge elsewhere because I would be risking to be sued for copyright infringement?"
This was not the case here, so not sure how that is related in any way?
Naturally there are very flexible ones, very draconian ones, and those in the middle.
Most people get away with them, because it isn't like everyone is taking others to copyright court sessions every single day, unless there are millions at play.
That’s the quality he’s vouching for.
AI did paste Mark's copyright on all the files for whatever reason but it did not lift the DWARF implementation from OxCaml and paste it into the PR.
The PR wasn't one-shot either. I had to steer it to completion over several days, had one AI review the changes made by the other, etc. The end result is that the code _works_ and does what it says on the tin!
The AI wrote code which worked, for a problem the submitter had, which had not been solved by any human for a long time, and there is limited human developer time/interest/funding available for solving it.
Dumping a mass of code (and work) onto maintainers without prior discussion is the problem[1]. If they had forked the repo, patched it themselves with that PR and used it personally, would they have a broken brain because of AI? They claim to have read the code, tested the code, they know that other people want the functionality; is wanting to share working code a "broken brain"? If the AI code didn't work - if it was slop - and they wanted the maintainers to fix it, or walk them through every step of asking the AI to fix it - that would be a huge problem, but that didn't happen.
[1] copyrightwashing and attribution is another problem but also not one that's "broken brain by the existence of AI" related.