Source: Generative Deep Learning by David Foster, 2nd edition, published in 2023. From “Tokenization” on page 134.
“If you use word tokens: …. willnever be able to predict words outside of the training vocabulary.”
"If you use character tokens: The model may generate sequences of characters that form words outside the training vocabulary."
And like I said, single-byte tokens very much are a part of word tokenisers, or to be precise, their token selection. "Word tokeniser" is a misnomer in any case - they are word piece tokenisers. English is simple enough that word pieces can be entire words. With languages where you have numerous suffixes, prefixes, and even in-fixes as a part of one "word" (as defined by "one or more characters preceded or followed by a space" - because the truth is more complicated than that), you have not so much "word tokenisers" as "subword tokenisers". A character tokeniser is just a special case of a subword tokeniser where the length of each subword is exactly 1.
“If you use word tokens: …. willnever be able to predict words outside of the training vocabulary.”
"If you use character tokens: The model may generate sequences of characters that form words outside the training vocabulary."