I'm not an absolutist on piracy in either direction, but when X or Y megacorp and all its affiliates can claim to "sell" you goods and then whimsically restrict access to them in such a way that further, future whimsies let them take away your purchased products, i'd hardly blame anyone for pirating.
Also, a company like Meta can pirate over 80 fucking TB of ebook content for indirectly commercial purposes, have its chief lie about being aware of this, and an average person who just wants content without so much bullshit DRM lock-in hassle should feel guilty about their choice?
You can request physical books (inter-library loans) and they often offer ebooks as well, although the service is likely to be cumbersome and hard to use because DRM (but if is too complicated you can still borrow the ebook legally in your phone and read a copy from "piracy" sites anywhere you want, with the benefit that the author will get royalties).
It isn't perfect, as in you may not find what you want to read or when you want to read it, but if it works for me, it may work for you as well.
They were stating a fact that some people may not know yet need to be informed of when using these services (for various and personal reasons, not all linked to feeling guilty). As they said, they're not making any moral or ethical assertions about it.
These websites are piracy, and I've used them in the past, still use them, and will probably keep using them. No fuss.
Regarding corporate piracy for AI, I don't think it's just Meta..
Which, paradoxically, calls for the need for more and more intellectual practice, which is a key purpose in the access to culture we have valued for millennia.
(Similar confusion is in that mentioned idea of Meta having done something wrong in processing texts - we can access all available texts.)
Or piracy is actually theft (as it supposedly is when individuals do it) and Meta did millions of counts of it and therefore should pay trillions in damages, be dissolved, have Zuck go to jail, or all three.
"Piracy" in that context is coming into possession of something you are not entitled to own. And this latter point is thin and a stub, just to say that they are different things - the one above is not (it could be expanded but would not change).
This is an odd strawman to tilt at.
LLMs ingests works but does not regurgitate them, so the product can be considered transformative. From my understanding of these models, they do not retain the original works. (There are probably reasons for the companies to retain the original works, but that is an entirely different matter.) So equating a trained model to copyright violations is akin to suggesting the knowledge, rather than the content, is copyrightable. Do we really want to enter that territory?
The other route of attack is via how the materials were acquired. This can create problems from several perspectives. If companies had to purchase each work in order to train a model, the process would only be accessible to very well financed corporations. Libraries as well, since they are essentially in the business of purchasing works (albeit for an entirely different purpose). If you allowed borrowed works to be used while training models, the notion of lending would likely come under attack. I'm not sure we want to go there either. Then there is the question of online materials that are freely available. What would protect them?
I'm not a fan of AI and I am even less of a fan of Meta. I would love to see them have the book thrown at them. I'm just uncomfortable with the potential repercussions of throwing the book at them.
There is a "right to learn". There is a "right to access". And there are values to pursue, and urgencies to tackle (a world collapsing on its own cognitive faults)...
Libgen, Anna's Archive, et al do however provide a valuable service in maintaining access to works out of distribution or blocked by censorship.