GPT-2 can successfully learn to do multiplication using the standard tokenizer though, using "Implicit CoT with Stepwise Internalization".
https://twitter.com/yuntiandeng/status/1836114401213989366
If anything I'd think this indicates the barrier isn't tokenization (if it can do arithmetic, it can probably count as well) but something to do with "sequential dependencies" requiring use of COT and explicit training. Which still leaves me puzzled: there are tons of papers showing that variants of GPT-2 trained in the right way can do arithmetic, where are the papers solving the "count R in strawberry" problem?
GPT-2 tokenization was a demonstratable problem: https://www.beren.io/2023-02-04-Integer-tokenization-is-insa... (Prior HN discussion: https://www.hackerneue.com/item?id=39728870 )
More recent research:
https://huggingface.co/spaces/huggingface/number-tokenizatio...
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs: https://arxiv.org/abs/2402.14903
https://www.beren.io/2024-07-07-Right-to-Left-Integer-Tokeni...