subs w0, w10, w11
b.vx trap
RISC-V: subw a0, t0, t1
sub a1, t0, t1
bne a0, a1, trapThe workaround suggested by the RISC-V documentation consists in replacing a very large fraction of all instructions of a program (because there are a lot of integer additions, subtractions and comparisons in any program, close to a half of all instructions) with 3 or more instructions, in order to approximate what in any other CPU is done with single instructions.
The other ridiculous workaround proposed to save RISC-V is that any high-performance implementation must supplant its missing features by instruction fusion.
Yes, the missing hardware for overflow detection can be replaced by multiplying the number of instructions for any operation and the missing addressing modes can be synthesized by instruction fusion, but such implementation solutions are extraordinarily more expensive than the normal solutions used for 3 quarters of century in the other computers, since they were made with vacuum tubes.
Because of the extreme overhead of checking for overflow, I bet that most programs compiled for RISC-V do not check for overflow, which is not acceptable in reliable programs (even when using C/C++, I always compile them with overflow checking enabled, which should have been the default option, to be disabled only in specific cases where it can be proven that overflow is impossible and the checks reduce the performance).
This is not what is normal overflow checking. Normal overflow checking just raises a specific exception when integer overflow happens.
This has absolutely no effect upon compiler optimizations. The compiler always generates code ignoring the possibility of exceptions. When exceptions happen, the control is passed far away to the exception handler, which decides what to do, e.g. to save debugging information and abort partially or totally the offending program, because an overflow is normally the consequence of an unforeseen program bug and it is impossible to do any action that will allow the continuation of the execution.
You should remember that there is nothing special about integer overflow, almost every instruction that the compiler generates can raise an exception at run time. Any branch instruction, any memory-access instruction, any floating-point instruction, any vector instruction can raise an exception due to hardware. In recent CPUs, integer overflow is not raised implicitly, so you have to insert a conditional branch, but this is irrelevant.
If your theory that the possibility of raising exceptions can influence compiler optimizations were true, there would exist no compiler optimizations, because from every 10 or so instructions generated by a compiler at least a half can raise various kinds of exceptions, in a manner completely unpredictable by the compiler. Adding integer overflow exceptions changes nothing.
For testing, I use a custom qemu plugin to calculate the dynamic instruction count, dynamic uop count, and dynamic instruction size. Every instruction with multiple register writebacks was counted as one uop per writeback, and to make the results more comparable, SIMD was disabled.
I used this setup to run self-compiling single-file versions of chibicc (assembling) and tinycc (generating object file), which are small C compilers of 9K and 24K LOC respectively. Both compilers were cross-compiled using clang-22 and were benchmarked cross-compiling themselves to x86.
Let's look at the impact of -ftrapv first. In chibicc O3/O2/Os the dynamic upos increased due to -ftrapv for RISC-V by 5.3%/5.1%/6.7%, and for ARM by 5.1%/5.0%/6.4%. Interestingly, in tinycc it only increased for RISC-V by 1.6%/1.0%/1.0%, while ARM increased slightly more with 1.6%/2.0%/1.3%.
In terms of dynamic instruction count, ARM needed to execute 6%/15% fewer instructions than RISC-V for chibicc/tinycc. Looking at the uops, RISC-V needs to execute 6% more uops in tinycc, but ARM needs to execute 0.5% more uops with chibicc. The dynamic instruction size, which estimates the pressure on icache and fetch bandwidth, was 24%/10% lower in RISC-V for chibicc/tinycc.
Note that this did not model any instruction fusion in RISC-V and only treated incrementing loads and load pairs as multiple uops (to mirror Apple Silicon).
If the only fusion pair you implement is adjacent compressed sp relative stores, then RISC-V ends up with a lower uop count for both programs. They are trivial to implement because you can just interpret the two adjacent 16-bit instructions as a single 32-bit instruction, and compilers always generate them next to each other and in sorted order in function prolog code. You can do this directly in your RVC expander; it only adds minimal additional delay (zero with a trick), which is constant regardless of decode width.
Raw data:
chibicc/clang-O3-armv9: insns: 419886184 uops: 450136257 bytes: 1679544736
chibicc/clang-O3-armv9-trap: insns: 450205913 uops: 474206409 bytes: 1800823652
chibicc/clang-O3-rva23: insns: 449328186 uops: 449328186 bytes: 1288202666
chibicc/clang-O3-rva23-trap: insns: 474623648 uops: 474623648 bytes: 1375991094
chibicc/clang-O2-armv9: insns: 421810039 uops: 451501004 bytes: 1687240156
chibicc/clang-O2-armv9-trap: insns: 451642152 uops: 475084965 bytes: 1806568608
chibicc/clang-O2-rva23: insns: 449625081 uops: 449625081 bytes: 1286452180
chibicc/clang-O2-rva23-trap: insns: 473682134 uops: 473682134 bytes: 1369720036
chibicc/clang-Os-armv9: insns: 457841653 uops: 489902437 bytes: 1831366612
chibicc/clang-Os-armv9-trap: insns: 497189616 uops: 523323893 bytes: 1988758464
chibicc/clang-Os-rva23: insns: 486216287 uops: 486216287 bytes: 1363135906
chibicc/clang-Os-rva23-trap: insns: 520889604 uops: 520889604 bytes: 1473263784
tinycc/clang-O3-armv9: insns: 115189179 uops: 126358884 bytes: 460756716
tinycc/clang-O3-armv9-trap: insns: 117139555 uops: 128361973 bytes: 468558220
tinycc/clang-O3-rva23: insns: 137035509 uops: 137035509 bytes: 427878586
tinycc/clang-O3-rva23-trap: insns: 139248009 uops: 139248009 bytes: 436548988
tinycc/clang-O2-armv9: insns: 115184314 uops: 126568360 bytes: 460737256
tinycc/clang-O2-armv9-trap: insns: 117651772 uops: 129195276 bytes: 470607088
tinycc/clang-O2-rva23: insns: 137362294 uops: 137362294 bytes: 420468990
tinycc/clang-O2-rva23-trap: insns: 138649335 uops: 138649335 bytes: 428680948
tinycc/clang-Os-armv9: insns: 130661270 uops: 144718253 bytes: 522645080
tinycc/clang-Os-armv9-trap: insns: 132574148 uops: 146565708 bytes: 530296592
tinycc/clang-Os-rva23: insns: 152798316 uops: 152798316 bytes: 452181732
tinycc/clang-Os-rva23-trap: insns: 154232874 uops: 154232874 bytes: 458257882