Comment by Joker_vD - Hacker Neue

Joker_vD Dec 12, 2025 parent

But RISC-V has it?

    SLT     a2, a0, a1
    SLT     a0, a1, a0
    SUB     a0, a0, a2
    RET

camel-cdr Dec 13, 2025

ARM:

    subs w0, w10, w11
    b.vx trap

RISC-V:

    subw a0, t0, t1
    sub a1, t0, t1
    bne a0, a1, trap

adrian_b Dec 12, 2025

Sorry, but this is the kind of ridiculous reply that the RISC-V fans give when they are asked why their ISA lacks many of the features that any decent ISA has and which have a negligible implementation cost, therefore no reason to be missing.

The workaround suggested by the RISC-V documentation consists in replacing a very large fraction of all instructions of a program (because there are a lot of integer additions, subtractions and comparisons in any program, close to a half of all instructions) with 3 or more instructions, in order to approximate what in any other CPU is done with single instructions.

The other ridiculous workaround proposed to save RISC-V is that any high-performance implementation must supplant its missing features by instruction fusion.

Yes, the missing hardware for overflow detection can be replaced by multiplying the number of instructions for any operation and the missing addressing modes can be synthesized by instruction fusion, but such implementation solutions are extraordinarily more expensive than the normal solutions used for 3 quarters of century in the other computers, since they were made with vacuum tubes.

Because of the extreme overhead of checking for overflow, I bet that most programs compiled for RISC-V do not check for overflow, which is not acceptable in reliable programs (even when using C/C++, I always compile them with overflow checking enabled, which should have been the default option, to be disabled only in specific cases where it can be proven that overflow is impossible and the checks reduce the performance).

Joker_vD OP Dec 13, 2025

Oh, sorry, I thought you were saying that "RISC-V's comparison instructions don't properly handle integer overflow that internally happens when they do the comparisons, i.e. it only has unsigned comparisons".

zozbot234 Dec 12, 2025

The cost of overflow checks turns out to be largely about missed optimizations due to heavier constraints wrt. how the program should behave if overflow occurs (e.g. preserving partial results). Having an overflow check instruction in the ISA just doesn't matter all that much, it can even hurt in bignum computation (often cited as a favorable case for overflow checks) by introducing unwanted insn dependencies.

adrian_b Dec 12, 2025

What you say about missed optimizations is true only when the compiler attempts to handle itself in a graceful way the cases when overflows would occur, instead of raising exceptions.

This is not what is normal overflow checking. Normal overflow checking just raises a specific exception when integer overflow happens.

This has absolutely no effect upon compiler optimizations. The compiler always generates code ignoring the possibility of exceptions. When exceptions happen, the control is passed far away to the exception handler, which decides what to do, e.g. to save debugging information and abort partially or totally the offending program, because an overflow is normally the consequence of an unforeseen program bug and it is impossible to do any action that will allow the continuation of the execution.

You should remember that there is nothing special about integer overflow, almost every instruction that the compiler generates can raise an exception at run time. Any branch instruction, any memory-access instruction, any floating-point instruction, any vector instruction can raise an exception due to hardware. In recent CPUs, integer overflow is not raised implicitly, so you have to insert a conditional branch, but this is irrelevant.

If your theory that the possibility of raising exceptions can influence compiler optimizations were true, there would exist no compiler optimizations, because from every 10 or so instructions generated by a compiler at least a half can raise various kinds of exceptions, in a manner completely unpredictable by the compiler. Adding integer overflow exceptions changes nothing.

camel-cdr Dec 13, 2025

Ok, let's test it then!

For testing, I use a custom qemu plugin to calculate the dynamic instruction count, dynamic uop count, and dynamic instruction size. Every instruction with multiple register writebacks was counted as one uop per writeback, and to make the results more comparable, SIMD was disabled.

I used this setup to run self-compiling single-file versions of chibicc (assembling) and tinycc (generating object file), which are small C compilers of 9K and 24K LOC respectively. Both compilers were cross-compiled using clang-22 and were benchmarked cross-compiling themselves to x86.

Let's look at the impact of -ftrapv first. In chibicc O3/O2/Os the dynamic upos increased due to -ftrapv for RISC-V by 5.3%/5.1%/6.7%, and for ARM by 5.1%/5.0%/6.4%. Interestingly, in tinycc it only increased for RISC-V by 1.6%/1.0%/1.0%, while ARM increased slightly more with 1.6%/2.0%/1.3%.

In terms of dynamic instruction count, ARM needed to execute 6%/15% fewer instructions than RISC-V for chibicc/tinycc. Looking at the uops, RISC-V needs to execute 6% more uops in tinycc, but ARM needs to execute 0.5% more uops with chibicc. The dynamic instruction size, which estimates the pressure on icache and fetch bandwidth, was 24%/10% lower in RISC-V for chibicc/tinycc.

Note that this did not model any instruction fusion in RISC-V and only treated incrementing loads and load pairs as multiple uops (to mirror Apple Silicon).

If the only fusion pair you implement is adjacent compressed sp relative stores, then RISC-V ends up with a lower uop count for both programs. They are trivial to implement because you can just interpret the two adjacent 16-bit instructions as a single 32-bit instruction, and compilers always generate them next to each other and in sorted order in function prolog code. You can do this directly in your RVC expander; it only adds minimal additional delay (zero with a trick), which is constant regardless of decode width.

Raw data:

    chibicc/clang-O3-armv9:       insns: 419886184    uops:  450136257    bytes: 1679544736
    chibicc/clang-O3-armv9-trap:  insns: 450205913    uops:  474206409    bytes: 1800823652
    chibicc/clang-O3-rva23:       insns: 449328186    uops:  449328186    bytes: 1288202666
    chibicc/clang-O3-rva23-trap:  insns: 474623648    uops:  474623648    bytes: 1375991094
    chibicc/clang-O2-armv9:       insns: 421810039    uops:  451501004    bytes: 1687240156
    chibicc/clang-O2-armv9-trap:  insns: 451642152    uops:  475084965    bytes: 1806568608
    chibicc/clang-O2-rva23:       insns: 449625081    uops:  449625081    bytes: 1286452180
    chibicc/clang-O2-rva23-trap:  insns: 473682134    uops:  473682134    bytes: 1369720036
    chibicc/clang-Os-armv9:       insns: 457841653    uops:  489902437    bytes: 1831366612
    chibicc/clang-Os-armv9-trap:  insns: 497189616    uops:  523323893    bytes: 1988758464
    chibicc/clang-Os-rva23:       insns: 486216287    uops:  486216287    bytes: 1363135906
    chibicc/clang-Os-rva23-trap:  insns: 520889604    uops:  520889604    bytes: 1473263784


    tinycc/clang-O3-armv9:        insns: 115189179    uops:  126358884    bytes: 460756716
    tinycc/clang-O3-armv9-trap:   insns: 117139555    uops:  128361973    bytes: 468558220
    tinycc/clang-O3-rva23:        insns: 137035509    uops:  137035509    bytes: 427878586
    tinycc/clang-O3-rva23-trap:   insns: 139248009    uops:  139248009    bytes: 436548988
    tinycc/clang-O2-armv9:        insns: 115184314    uops:  126568360    bytes: 460737256
    tinycc/clang-O2-armv9-trap:   insns: 117651772    uops:  129195276    bytes: 470607088
    tinycc/clang-O2-rva23:        insns: 137362294    uops:  137362294    bytes: 420468990
    tinycc/clang-O2-rva23-trap:   insns: 138649335    uops:  138649335    bytes: 428680948
    tinycc/clang-Os-armv9:        insns: 130661270    uops:  144718253    bytes: 522645080
    tinycc/clang-Os-armv9-trap:   insns: 132574148    uops:  146565708    bytes: 530296592
    tinycc/clang-Os-rva23:        insns: 152798316    uops:  152798316    bytes: 452181732
    tinycc/clang-Os-rva23-trap:   insns: 154232874    uops:  154232874    bytes: 458257882

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous