Preferences

Greedy and relentless OpenAI's scraping may be, but that his web-based startup didn't have a rudimentary robots.txt in place seems inexcusably naive. Correctly configuring this file has been one of the most basic steps of web design for living memory and doesn't speak highly of the technical acumen of this company.

>“We’re in a business where the rights are kind of a serious issue, because we scan actual people,” he said. With laws like Europe’s GDPR, “they cannot just take a photo of anyone on the web and use it.”

Yes, and protecting that data was your responsibility, Tomchuck. You dropped the ball and are now trying to blame the other players.


OpenAI will happily ignore robots.txt

Or is that still my fault somehow?

Maybe we should stop blaming people for "letting" themselves get destroyed and maybe put some blame on the people actively choosing to behave in a way that harms everyone else?

But then again, they have so much money so we should all just bend over and take it, right?

If they ignore a properly configured robots.txt and the licence also explicitly denies them use, then I'd guess they have a viable civil action to extract compensation. But that isn't the case here at all, and while there's reports of them doing so, they certainly claim to respect the convention.

As for bending over, if you serve files and they request files, then you send them files, what exactly is the problem? That you didn't implement any kind of rate limiting? It's a web-based company and these things are just the basics.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal