Preferences

thethirdone parent
The nature of numbers as `A10^(n+1) + B10^n` for digits `XXXABXXX` is a very important relationship for doing any arithmetic. As you tokenize strings of digits, you lose the position information within the token make more complicated relationships between tokens because the total number of token pairs increases.

For example in order for a super simple model to learn 3 digit multiplication, it would need to see at least one example for each token in order to get ANY information about what number it represents. Alternatively, with single digits you only need an example where each position is present in each location. Obviously, we would hope to have plenty of data, but I would expect better generalization from models which need to rely on memorization less.

Alternatively, I can see a few reason why grouped digits would be better, but they are more complicated reasons than the reason above so by Occam's Razor my intuition says single digits should be better.


This item has no comments currently.