Comment by mgraczyk - Hacker Neue

But that view is wrong, the model outputs multiple tokens.

The right alternative view is that it's an immutable function from prefixes to a distribution over all possible sequences of tokens less than (context_len - prefix_len).

There are no mutable functions that cannot be viewed as immutable in a similar way. Human brains are an immutable function from input sense-data to the combination (brain adaptation, output actions). Here "brain adaptation" doing a lot of work, but so would be "1e18 output tokens". There is much more information contained within the latter

This item has no comments currently.