Comment by hansworst - Hacker Neue

hansworst May 18, 2024 parent

I think it’s questionable whether you can actually use this bit count to represent the amount of information from the book. Those 1200 bits represent the way in which this particular book is different from everything else the model has ingested. Similarly, if you read an entire book yourself, your brain will just store the salient bits, not the entire text, unless you have a photographic memory.

If we take math or computer science for example: some very important algorithms can be compressed to a few bits of information if you (or a model) have a thorough understanding of the surrounding theory to go with it. Would it not amount to IP infringement if a model regurgitates the relevant information from a patent application, even if it is represented by under a kilobyte of information?

ben_w May 18, 2024

I agree with what I think you're saying, so I'm not sure I've understood you.

I think this is all still compatible with saying that ingesting an entire book is still:

> If you're taking a handful of word probabilities from every book ever written, then the portion taken from each work is very, very low

(Though I wouldn't want to make a bet either way on "so courts aren't likely to care" that follows on from that quote: my not-legally-trained interpretation of the rules leads to me being confused about how traditional search engines aren't a copyright violation).

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous