I agree with what I think you're saying, so I'm not sure I've understood you.
I think this is all still compatible with saying that ingesting an entire book is still:
> If you're taking a handful of word probabilities from every book ever written, then the portion taken from each work is very, very low
(Though I wouldn't want to make a bet either way on "so courts aren't likely to care" that follows on from that quote: my not-legally-trained interpretation of the rules leads to me being confused about how traditional search engines aren't a copyright violation).
If we take math or computer science for example: some very important algorithms can be compressed to a few bits of information if you (or a model) have a thorough understanding of the surrounding theory to go with it. Would it not amount to IP infringement if a model regurgitates the relevant information from a patent application, even if it is represented by under a kilobyte of information?