Comment by ben_w - Hacker Neue

ben_w 1 day ago parent

> If you rarely got to see letters and just saw fragments of words as something like Chinese characters (tokens), could you count the R's in arbitrary words well?

While this seems correct, I'm sure I tried this when it was novel and observed that it could split the word into separate letters and then still count them wrong, which suggested something weird is happening internally.

I just now tried to repeat this, and it now counts the "r"'s in "strawberry" correctly (presumably enough examples of this specifically on the internet now?), but I did find it making the equivalent mistake with a German word (https://chatgpt.com/share/6859289d-f56c-8011-b253-eccd3cecee...):

  How many "n"'s are in "Brennnessel"?

But even then, having it spell the word out first, fixed it: https://chatgpt.com/share/685928bc-be58-8011-9a15-44886bb522...

kbelder 1 day ago

Counting letters is such a dull test. LLMs generally have a hard time with this question because letters are tokenized before they receive them, and they have to go through an involved reasoning process to figure it out. It's like asking a color blind person what color the street light is, and declaring him unintelligent because he sometimes gets the answer wrong.

moomin 7 hours ago

I mean, if you don’t want to include tests that LLMs are, by definition, bad at, why don’t we do the same thing for humans?

This item has no comments currently.