The key insight is that you can represent different features by vectors that aren't exactly perpendicular, just nearly perpendicular (for example between 85 and 95 degrees apart). If you tolerate such noise then the number of vectors you can fit grows exponentially relative to the number of dimensions.
12288 dimensions (GPT3 size) can fit more than 40 billion nearly perpendicular vectors.
12288 dimensions (GPT3 size) can fit more than 40 billion nearly perpendicular vectors.
[1]: https://www.3blue1brown.com/lessons/mlp#superposition