It isn't at all obvious to me that the LLM can decide to blur their vision, so to speak, and see the tokens as tokens: they don't get to run a program on this data in some raw format, and even if they do attempt to write a program and run it in a sandbox they would have to "remember" what they were given and then regenerate it (well, I guess a tool could give them access to the history of their input, but at that point that tool likely sees characters), rather than to copy it. I am 100% with andy99 on this: it isn't anywhere near as simple as you are making it out to be.
If each character were represented by its own token, there would be no need to "blur" anything, since the model would receive a 1:1 mapping between input vectors and individual characters. I never claimed that character-level reasoning is easy or simple for the model; I only said that it becomes theoretically possible to generalize ("potentially learn") without memorizing the character makeup of every token, which is required when using subword tokenization.
Please take another look at my original comment. I was being precise about the distinction between what's structurally possible to generalize vs memorize.
In contrast, if the model were trained with a character-level vocabulary, where each character maps to a unique token, it would not need to memorize character counts for entire words. Instead, it could potentially learn a generalizable method for counting characters across all sequences, even for words it has never seen before.
I'm not sure about what you mean about them not "seeing" the tokens. They definitely receive a representation of each token as input.