I wasn't familiar with the "Wave32" term, but took "RDNA" to mean the smaller wavefront size. I've used both, and wave32 is still quite effective for CFD.
ROCm support for RDNA took like 2 years, maybe longer.
If you actually were using both, you'd know that CDNA was the only supported platform on ROCm for what felt like an eternity. That's because CDNA was designed to be as similar to GCN so that ROCm could support it easier.
--------
What I'm saying is that today, now that ROCm works on RDNA and CDNA, the two architectures can finally be unified into UDNA. And everyone should be happy with the state of code moving forward.
CDNA executes 64-threads per compute unit per clock tick. RDNA only executes 32-threads. CDNA is smaller, more efficient, more parallel and much higher compute than RDNA.
Furthermore, all ROCm code from GCN (and older) was on Wave64, because historically AMD's architecture from 2010 through 2020 was Wave64. RDNA changed to Wave32 so that they can match NVidia and have slightly better latency characteristics (at the cost of bandwidth).
CDNA has more compute bandwidth and parallelism. RDNA is narrower, faster latency and less parallelism. Building a GPU out of 2048-bit compute (aka: 64-lanes x 32-bit wide/CDNA) is always going to be more bandwidth than 1024-bit compute (aka: 32-lanes x 32-bit wide) like RDNA.