Comment by Scene_Cast2

I agree that you can encode any single concept and that the encoding space of a single top pick grows exponentially.

However, I'm talking about the probability distribution of tokens.

anonymoushn 1 day ago

I think within the framework of "almost-orthogonal axes" you can still create a vector that has the desired mix of projections onto any combination of these axes?

yorwba 8 hours ago

No. You can fit an exponential number of almost-orthogonal vectors into the input space, but the number of not-too-similar probability distributions over output tokens is also exponential in the output dimension. This is fine if you only care about a small subset of distributions (e.g. those that only assign significant probability to at most k tokens), but if you pick any random distribution, it's unlikely to be represented well. Fortunately, this doesn't seem to be much of an issue in practice and people even do top-k sampling intentionally.

This item has no comments currently.