> Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs - https://arxiv.org/abs/2402.14903
Very interesting paper. It does make sense to me the R2L chunking would be better than L2R chunking. It doesn't actually study single digit tokenization.
I am mostly interested in a direct comparison between an LLM wide tokenization vs single digit tokenization. It would be nice to see a direct comparison between similarly trained models. Otherwise it is very hard to get a definitive answer by comparing models with varying sizes, training time, and general strength.
> xVal: A Continuous Number Encoding for Large Language Models - https://arxiv.org/abs/2310.02989
I have seen this paper before, but hadn't payed attention to the p10 vs p100 analysis. Its not clear that the findings would be relevant to an LLM like gtp4 though.
Yeah
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs - https://arxiv.org/abs/2402.14903
xVal: A Continuous Number Encoding for Large Language Models - https://arxiv.org/abs/2310.02989
I believe there's another paper that demonstrates something like also for the likes of spelling, counting etc but i can't remember it.