Preferences

krackers parent
Until I see evidence that an LLM trained at e.g. the character level _CAN_ successfully "count Rs" then I don't trust this explanation over any other hypothesis. I am not familiar with the literature so I don't know if this has been done, but I couldn't find anything with a quick search. Surely if someone did successfully do it they would have published it.

The math tokenization research is probably closest.

GPT-2 tokenization was a demonstratable problem: https://www.beren.io/2023-02-04-Integer-tokenization-is-insa... (Prior HN discussion: https://www.hackerneue.com/item?id=39728870 )

More recent research:

https://huggingface.co/spaces/huggingface/number-tokenizatio...

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs: https://arxiv.org/abs/2402.14903

https://www.beren.io/2024-07-07-Right-to-Left-Integer-Tokeni...

krackers OP
GPT-2 can successfully learn to do multiplication using the standard tokenizer though, using "Implicit CoT with Stepwise Internalization".

https://twitter.com/yuntiandeng/status/1836114401213989366

If anything I'd think this indicates the barrier isn't tokenization (if it can do arithmetic, it can probably count as well) but something to do with "sequential dependencies" requiring use of COT and explicit training. Which still leaves me puzzled: there are tons of papers showing that variants of GPT-2 trained in the right way can do arithmetic, where are the papers solving the "count R in strawberry" problem?

anonymoushn
There are various papers about this, maybe most prominently Byte-Latent Transformer.

This item has no comments currently.