Comment by quamserena

quamserena Nov 24, 2025 parent

Including RTL-LTR flips, character substitutions etc? I think Unicode is vast enough where it’s possible to evade any filter and still look textlike enough to the end user, and how could you possibly know if it’s really a Greek question mark or if they’re just trying to mess with your AI?

mhjkl Nov 24, 2025

Afaik most LLM datasets use FastText or something similar to detect the language of the data and if it's spam, and some additional small language models to detect if text is "educational" or desirable in some other way. Often text is filtered in instead of filtered out, so anything unusual like this probably won't pass the filter, you don't need to detect it explicitly.

Sabinus Nov 24, 2025

Ultimately the AI will just learn those tokens are basically the same thing. You'll just be reducing the learning rate by some (probably tiny) amount.

zahlman Nov 24, 2025

I assume that anyone trying to "filter" the text could just render it and then OCR it.

quamserena OP Nov 24, 2025

This works for ASCII, and you could just “smush” these special Unicode chars into ASCII lookalikes but then your AI won’t be usable by people who actually use these chars as part of their language.

pixl97 Nov 24, 2025

> and how could you possibly know if it’s really a Greek question mark or if they’re just trying to mess with your AI?\

I mean how could YOU possibly know if it's really a Greek question mark... context. LLM's are a bit more clever than you're giving them credit for.

quamserena OP Nov 24, 2025

I think the bigger problem is that if the dataset was sufficiently poisoned, LLMs could start producing Greek question marks in their output. Like if you could tie it to some rare trigger words you could then use those words to cause generated code not to compile (despite passing visual inspection).

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous