This is a problem because we are approaching AI from an angle of no a priori assumptions about the variations on the pattern that it should be able to generalize to. We just imagine that there's some magic way to recognize any isomorphic representation and transfer our knowledge to the new variables, when the reality is we can only recognize when the domain being transferred to is only different in a narrow set of ways like being upside down or on a bent surface. The set of possible variations on a 2d platformer we can generalize well enough to just pick up and play is a tiny subset of all the ways you could map the pixels on the screen to something else without technically losing information.
We could probably make an AI that bakes in the sort of assumptions where it can easily generalize what it learns to fourier space representations of the same data, but then it probably wouldn't be good at generalizing the same sorts of things we are good at generalizing.
My point (hypothesis really) is that the ability to "generalize in general" is a fiction. We can't do it either. But the sort of things we can generalize are exactly the sort that tend to occur in nature anyway so we don't notice the blind spot in what we can't do because it never comes up.
Except that's of course superficial nonsense. Position space isn't an accident of evolution, one of many possible encodings of spatial data. It's an extremely special encoding: The physical laws are local in position and space. What happens on the moon does not impact what happens when I eat breakfast much. But points arbitrarily far in momentum space do interact. Locality of action is a very very deep physical principle, and it's absolutely central to our ability to reason about the world at all. To break it apart into independent pieces.
So I strongly reject your example. It makes no sense to present the pictures of a video game in Fourier space. Its highly unnatural for very profound reasons. Our difficulty stems entirely from the fact that our vision system is built for interpreting a world with local rules and laws.
I also see no reason that an AI could successfully transfer between the two representations easily. If you start from scratch it could train on the Fourier space data, but that's more akin to using different eyes, rather than transfer.