It's one of the funny things of the Raspberry Pi Pico W: The Infineon CYW4343 has an integrated ARM Cortex-M3 CPU, so the WiFi/BT chip is technically more advanced than the actual RP2040 (which is a Cortex-M0+) and also has more built-in ROM/RAM than what's on the Pico board for the RP2040 to use.
And yeah, you can't really buy sprite-based video chips anymore, and you don't even have to worry about stuff like "Sprites per Scanline" because you can get a proper framebuffer for essentially free - but now you might as well go further and use one microprocessor to be the CPU, GPU, and FM Synthesizer Sound Chip and "just" add the logic to generate the actual video/audio signals.
A FPGA is really just the right tool for solving the video problem. Or some projects do it with a micro-controller. But it's sort of too bad as it kind of undercuts the spirit of the whole design. If you video processor is orders of magnitude more powerful than the rest of the computer, then one starts to ask why not just implement the entire computer inside the video processor?