Comment by peebee67 - Hacker Neue

peebee67 Jan 10, 2025 parent

Greedy and relentless OpenAI's scraping may be, but that his web-based startup didn't have a rudimentary robots.txt in place seems inexcusably naive. Correctly configuring this file has been one of the most basic steps of web design for living memory and doesn't speak highly of the technical acumen of this company.

>“We’re in a business where the rights are kind of a serious issue, because we scan actual people,” he said. With laws like Europe’s GDPR, “they cannot just take a photo of anyone on the web and use it.”

Yes, and protecting that data was your responsibility, Tomchuck. You dropped the ball and are now trying to blame the other players.

mystified5016 Jan 10, 2025

OpenAI will happily ignore robots.txt

Or is that still my fault somehow?

Maybe we should stop blaming people for "letting" themselves get destroyed and maybe put some blame on the people actively choosing to behave in a way that harms everyone else?

But then again, they have so much money so we should all just bend over and take it, right?

peebee67 OP Jan 10, 2025

If they ignore a properly configured robots.txt and the licence also explicitly denies them use, then I'd guess they have a viable civil action to extract compensation. But that isn't the case here at all, and while there's reports of them doing so, they certainly claim to respect the convention.

As for bending over, if you serve files and they request files, then you send them files, what exactly is the problem? That you didn't implement any kind of rate limiting? It's a web-based company and these things are just the basics.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous