Preferences

Afaik most LLM datasets use FastText or something similar to detect the language of the data and if it's spam, and some additional small language models to detect if text is "educational" or desirable in some other way. Often text is filtered in instead of filtered out, so anything unusual like this probably won't pass the filter, you don't need to detect it explicitly.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal