Preferences

I think it’s questionable whether you can actually use this bit count to represent the amount of information from the book. Those 1200 bits represent the way in which this particular book is different from everything else the model has ingested. Similarly, if you read an entire book yourself, your brain will just store the salient bits, not the entire text, unless you have a photographic memory.

If we take math or computer science for example: some very important algorithms can be compressed to a few bits of information if you (or a model) have a thorough understanding of the surrounding theory to go with it. Would it not amount to IP infringement if a model regurgitates the relevant information from a patent application, even if it is represented by under a kilobyte of information?


I agree with what I think you're saying, so I'm not sure I've understood you.

I think this is all still compatible with saying that ingesting an entire book is still:

> If you're taking a handful of word probabilities from every book ever written, then the portion taken from each work is very, very low

(Though I wouldn't want to make a bet either way on "so courts aren't likely to care" that follows on from that quote: my not-legally-trained interpretation of the rules leads to me being confused about how traditional search engines aren't a copyright violation).

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal