Preferences

Including RTL-LTR flips, character substitutions etc? I think Unicode is vast enough where it’s possible to evade any filter and still look textlike enough to the end user, and how could you possibly know if it’s really a Greek question mark or if they’re just trying to mess with your AI?

Afaik most LLM datasets use FastText or something similar to detect the language of the data and if it's spam, and some additional small language models to detect if text is "educational" or desirable in some other way. Often text is filtered in instead of filtered out, so anything unusual like this probably won't pass the filter, you don't need to detect it explicitly.
Ultimately the AI will just learn those tokens are basically the same thing. You'll just be reducing the learning rate by some (probably tiny) amount.
I assume that anyone trying to "filter" the text could just render it and then OCR it.
This works for ASCII, and you could just “smush” these special Unicode chars into ASCII lookalikes but then your AI won’t be usable by people who actually use these chars as part of their language.
> and how could you possibly know if it’s really a Greek question mark or if they’re just trying to mess with your AI?\

I mean how could YOU possibly know if it's really a Greek question mark... context. LLM's are a bit more clever than you're giving them credit for.

I think the bigger problem is that if the dataset was sufficiently poisoned, LLMs could start producing Greek question marks in their output. Like if you could tie it to some rare trigger words you could then use those words to cause generated code not to compile (despite passing visual inspection).

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal