Still, kudos to the developers for making some things better (TCG actually does seem noticeably faster in 2.0.0) and congratulations on the 2.0.0 release!
My take on the KVM vs TCG split is simply that in the end open source projects are driven by the people who put in the work, and inevitably if there's a set of people whose day job is to work on the project then it's going to tend to improve in the areas those people and companies need. And in general the people working on the emulation side of things seem to be happy enough with the performance levels we currently have: I see plenty of patches to add new ARM boards or fix issues with devices, or to fix bugs in the linux-user emulation code. In comparison, there doesn't really seem to be much interest in TCG performance (and serious improvements in performance, though possible, would be a six month or longer project to achieve).
The particular bugs you note are even further out in the cold since x86 guest TCG in particular is pretty much orphaned and MIPS is not a great deal better (though I have been heartened to see recent contributions from Imagination).
What I had read as implied in the SO answer I linked to was that there was some other means of implementing cross-architecture emulation that was faster than QEMU.
In other words, there's a difference between being "slow" in absolute terms vs. being "slow" relative to other virtualization solutions.
From your answer, I gather that QEMU isn't under-performant relative to other emulation options.