obscure-enigma parent
this research is too simplified and kind of vague, as it's the inherent nature of language models for that matter any probabilistic model, to compress the information for better generalization since there is a lower bound to how much loss they can incur while decoding the information. LLMs are indeed lossy compressors