Indeed the company should purchase the books. If they obtain copies in a process that violates copyright, then that's indeed a violation of copyright.
The current decision does not rule on the legality of obtaining the books without purchasing.
However, that option was ultimately not pursued as instead...
>> Anthropic spent many millions of dollars to purchase millions of print books, often in used condition. Then, its service providers stripped the books from their bindings, cut their pages to size, and scanned the books into digital form — discarding the paper originals. Each print book resulted in a PDF copy containing images of the scanned pages with machine-readable text (including front and back cover scans for softcover books). Anthropic created its own catalog of bibliographic metadata for the books it was acquiring. It acquired copies of millions of books, including of all works at issue for all Authors.
(from the ruling)
If the actual model was trained from the unauthorized copies, and then they post-hoc bought the books, that doesn't retroactively cancel the initial copyright violation. As I understand they did not retrain the model using the OCR'd scans
Such imperfect measures offer a compromise between "big tech can steal everything" and "LLMs trained on unpurchased books are illegal".
It's not just books but any tragedy-of-the-commons situation where a "feeder industry" for training can be fatally undermined by the very LLM that desires future training data from that industry.