Comment by zachooz - Hacker Neue

zachooz 1 day ago parent

If each character were represented by its own token, there would be no need to "blur" anything, since the model would receive a 1:1 mapping between input vectors and individual characters. I never claimed that character-level reasoning is easy or simple for the model; I only said that it becomes theoretically possible to generalize ("potentially learn") without memorizing the character makeup of every token, which is required when using subword tokenization.

Please take another look at my original comment. I was being precise about the distinction between what's structurally possible to generalize vs memorize.

This item has no comments currently.