Apart from the legacy case, you need it for MMIO---KVM for ARM also has a mini parser for LDR/STR instructions.
x86 however has all sorts of wonderful read-modify-write instructions too. You need to support those, but it would still be a small subset of the full x86 instruction set if you all you want to support is processors newer than circa 2010.
Yeah, it still gets hit now and then. It should not get hit often in the typical steady state, though, which is why you can punt it to userspace with little performance penalty.
(I work on the custom VMM we run)
(We were talking about emulation-via-just-interpret-one-instruction in userspace in upstream QEMU the other day -- you'd want it for OSX hypervisor.framework support too, after all. And maybe for the corner cases in TCG where you'd otherwise emulate one instruction and throw away the cached translation immediately.)