Profile: brucehoult - Hacker Neue

brucehoult

Joined Feb 21, 2007 2,182 karma

Previously software engineer working on compiler toolchains and runtime libraries at SiFive. Also involved in Working Groups developing future RISC-V extensions. Previously Samsung R&D (toolchains and runtime libraries for Android and Tizen), Mozilla (JavaScript JIT), and others. All opinions my own.

brucehoult Dec 12, 2025

> Qualcomm acquired Nuvia in order to bypass the licence fees charged by ARM
Both Nuvia and Qualcomm had Arm Architecture licenses that allowed them to develop and sell their own Arm-compatible CPUs.
There was no bypassing of license fees.
If Qualcomm had hired the Nuvia engineers before they developed their core at Nuvia, and they developed exactly the same core while employed at Qualcomm, then there would be no question that everyone was obeying the terms of their licenses.
Arm's claim rests on it being ok for Nuvia to sell chips of their own design, but not to sell the design itself, and not to transfer the design as part of selling the company.
brucehoult Dec 12, 2025

The X280 is nothing special as a CPU core. It's basically the U74 with added 512 bit vector unit (but only 256 bit ALU), which makes it pretty much equivalent to SpacemiT's X60 core in their K1/M1 SoCs.
There is no X280 hardware available yet for general purchase. There is the HiFive Xara X280 announced in May, but that is believed to be available to SiFive licensees only. The SG2380 was going to have X280s as an NPU alongside P670 main cores, but that's been cancelled as a result of US sanctions on Sophgo. The PIC64-HSPC is a rad-hard chip using the X280 for NASA and other space customers, but will not be cheap -- the RAD750 PowerPC chip it is replacing reportedly costs $200,000 each.
brucehoult Dec 12, 2025

What mistakes?
No one is ever going to design an ISA that is complete and finished forever on Day #1. There are always going to be new data types and new algorithms to support e.g. the current rush to add "AI" support to all ISAs (NPUs, TPUs, whatever you want to call them).
Arm has ARMv9-A following on from ARMv8-A, and they are already up to Armv9.7-A in addition to as many ARMv8-A enhancements.
Intel/AMD have all kinds of add-ons to x86_64, and not even linear e.g. the here now gone now AVX512. Finally here to stay (presumably) in x86-64-v4. And there is already APX and AVX10 to add to that.
brucehoult Dec 11, 2025

> Aarch64's ecosystem is huge
ARMv8 hardware (other than Apple) only shipped 3-6 years before RV64GC/RVA20, and ARMv9 is only about two years before the equivalent RVA23 -- at least in SBCs/Laptops. Obviously ARMv8 hardware went into mobile devices a lot earlier, though it was often running 32 bit code for the first few years.
It's nothing at all like the maturity lead x86 has over both.
brucehoult Dec 11, 2025
What version fragmentation?
Pretty much everything coming out in 2026 -- including Ventana's Veyron V2 -- is RVA23.
One profile to rule them all.
Currently-shipping applications processors are either RVA20 (plus the B extension in practice) or RVA22 with V as a standard option.
That's not fragmentation, it's just a standard linear progression. Each thing can run all the software from the previous thing:
```
    RVA20 (what e.g. Ubuntu 25.04 and earlier require)
    -> RVA20 + B
    -> RVA22
    -> RVA22 + V
    -> RVA23 (what Ubuntu 25.10 and later require)
```
brucehoult Dec 10, 2025
Quote, because unlike on Reddit I couldn't figure out how to do multi para > quotes with code here.
------
Compressed pointers reduce the need for memory by storing pointers as 32-bit unsigned offsets relative to a base register. Decompressing the pointers just consists of adding the offset and register together. As simple as this sounds, it comes with a small complication on our RISC-V 64-bit port. By construction, 32-bit values are always loaded into the 64-bit registers as signed values. This means that we need to zero-extend the 32-bit offset first. Until recently this was done by bit-anding the register with 0xFFFF_FFFF:
```
    li   t3,1
    slli t3, t3, 32
    addi t3, t3, -1
    and  a0, a0, t3
```
Now, this code uses the `zext.w` instruction from the Zba extension:
```
    zext.w a0, a0
```
-----
This is so strange. Does no one at Google know RISC-V? This has *never* needed more than...
```
    slli a0, a0, 32
    srli a0, a0, 32
```
And if they're going to use `Zba`, and zero-extend it and then add it to another register, then why use a separate `zext.w` instruction and `add` instead of ...
```
    add.uw decompressed, compressed, base
```
... to zero extend and add in one instruction??
After all, `zext.w` is just an alias for `add.uw` with the `zero` register as the last argument...
They also could have always simply stored the 32 bit offset as signed and pointed the base register 2GB into the memory area instead of using x86/Arm-centric design.
brucehoult Dec 10, 2025
No, not on a laptop with anything like a comparable number of cores.
Any x86 or Apple Silicon laptop that can match the DC-ROMA II in QEMU will need around three times as many cores -- if the task even scales to that many cores -- and will cost a lot more.
I tried compiling GCC 13 on my i9-13900HX laptop with 24 cores, and on Milk-V Megrez which is the same chip but only one of them (4 cores, not 8):
on Megrez:
```
    real    260m14.453s
    user    872m5.662s
    sys     32m13.826s
```
On docker/QEMU on i9:
```
    real    209m15.492s
    user    2848m3.082s
    sys     29m29.787s
```
Only just 25% faster on the x86 laptop. Compared to an 8 core RISC-V it would be slower.
And 3.2x more CPU time on the x86 with QEMU than on the RISC-V natively, so you'd need that many more "performance" cores than the either this RISC-V laptop has RISC-V.
Or build Linux kernel 7503345ac5f5 (almost exactly a year old at this point) using RISC-V defconfig:
i9-13900HX docker/qemu
```
    real    19m12.787s
    user    583m44.139s
    sys     10m3.000s
```
Ryzen 5 4500U laptop docker/qemu (Zen2 6 cores, Win11)
```
    real    143m20.069s
    user    820m26.988s
    sys     24m33.945s
```
Mac Mini M1 docker/qemu (4P + 4E cores)
```
    real    69m16.520s
    user    531m47.874s
    sys     12m28.567s
```
VisionFive 2 (4x U74 in-order cores @1.5 GHz, similar to RPi 3)
```
    real    67m35.189s
    user    249m55.469s
    sys     13m35.877s
```
Milk-V Megrez (4x P550 cores @1.8 GHz)
```
    real    42m12.414s
    user    149m5.034s
    sys     11m33.624s
```
The cheap (~$50) VisionFive 2 is the same speed as an M1 Mac with qemu, or twice as fast as the 6 core Zen 2).
The 4 core Megrez takes around twice as long as the 24 core i9 with qemu. Eight of the same cores in the DC-Roma II will match the 24 core i9 and be more than three times faster than the 8 core M1 Mac.
brucehoult Dec 9, 2025

Why "bad"? It seems to me it does exactly what it sets out to do.
Obviously, if you just want a fast laptop with a long battery life and you don't care what is inside it then you should get a Mac, or possibly something with the latest Qualcomm SoC, or an x86.
If so then this isn't for you anyway.
Jeff's facts are, obviously, correct but I really wish he'd drop all the snark. Just start off right at the start by saying "If you don't want this BECAUSE it's the RISC-V then it's not for you, wait for the 8-wide RVA23 machines in a year or so" and then stick to the facts from then on.
The people who are actually interested in something like this need a machine to work on for the next year, and this is by far the best option at the moment (unless you need RVV).
It's, so far, and for many purposes, the fastest RISC-V machine you can buy [1] and you can carry it around and even use it without power in a cafe or something for a while.
I don't even know what the last time was I wanted to use my laptop away from AC for more than 2-3 hours. As a 24 core i9 the battery life is only slightly longer anyway -- about 5 hours of light editing and browsing in Linux, but if I start to actually do heavy compiling using 200W then it's dead really quickly.
[1] the Milk-V Pioneer with 64 slower cores is faster for some things, but there isn't all that much that can really use more than 8 cores, even most software builds. And it's been out of production for a year, and costs $2500+ anyway.
brucehoult Dec 6, 2025

Heeey, how's the Cruz treating you? If it still is.
I don't know why you'd ever want to pay a cent more for a 6502 or 8051 or AVR than for a RISC-V or ARM (e.g. Puya PY32F002A). Especially when the CH32V002/4/6 run on anything from 2V to 5V (plus a margin) which is pretty rare, and they don't need any external components.
I don't know whether the M6809 designers were the first to ever analyse a body of real software to find instruction and addressing mode frequencies and the distribution of immediates in order to optimise the encoding of a new ISA -- in a way that the 8086 people clearly didn't [1], but I think they were the first to publish about it, and I was fascinated by their BYTE articles at the time.
MSP430 is also a fun ISA. I just wish they were cheaper, and the cheap ones has more than 512 bytes of RAM. FRAM is funky. I also loooove the instruction encoding e.g. `add.w r10,r11` is `0x5B0A` where `5` is `add`, `B` is src register, `0` means reg to reg word size, `A` is dst register. Just beautiful. Far nicer for emulating on a 6502 or z80 than Arm or RISC-V too. The R2/R3 const generation is a bit whack though.
[1] e.g. on one hand deciding it was worth squeezing a 5 bit offset from any of 4 registers into a 2-byte instruction, while also providing 8 and 16 bit offsets with 3 and 4 byte instructions. They were also confident enough to relegate the 6800's SEC/CLC/SEI/CLI/SEV/CLV to two-byte instructions (with a mask so you could do multiple at once). But not confident enough to do the same with DAA, or SEX. They kept the M6800 encoding for DAA (and for as much else as possible e.g. keeping the opcodes for indexed addressing, but expanding from one option to dozens), but SEX was new to them and they could have experimented with it.
brucehoult Dec 2, 2025

Dude. You've become a verb.
brucehoult Dec 2, 2025

He also runs a site with a bunch of different compilers and versions :p
brucehoult Dec 2, 2025

That's 1 byte smaller than `LDA #0`, but not faster. And you don't have enough registers to waste them -- being able to do `STZ` and the `(zp)` addressing mode without having to keep 0 in Z or Y were small but soooo convenient things in the 65C02.
brucehoult Dec 2, 2025

65C02s are $8 now. That didn't stop me buying one when I was stuck at home during COVID. And a 6809 too.
But forget AVR. Yeah, for a buck or so the ATTiny85 was my go-to small MCU five years ago, and the $5 328 for bigger tasks.
But for the last three years both can be replaced by a 48 MHz 32 bit RISC-V CH32V003 for $0.10 for the 8 pin package (like ATTiny85, and also no external components needed) and $0.20 for the 20 pin package with basically the same number of GPIOs as the 328. At 2k RAM and 16K flash it's the same RAM and a little less flash than the ATMega328 -- but not as much as you'd think as RISC-V handles 16 and 32 bit values and pointers sooo much better.
And now you have the CH32V002/4/5/6 with enhanced CPU and more RAM and/or flash -- up to 8K rAM and 62K flash on the 006 -- and still for around the $0.10-$0.20 price
https://www.lcsc.com/product-detail/C42431288.html
brucehoult Nov 27, 2025

Not really. It looks like that in the C code, but in the generated machine code it'll just be a single `MULH` instruction giving (only) the upper 64 bits of the result, no shift needed.
brucehoult Nov 14, 2025

I guess they haven't had a customer who has wanted to send something to Mars yet. If they have sent something to Europa it's not like Mars is harder.
Blue Origin got patents on landing on a drone ship a decade ago. Until today they'd never done it.
Not sure what your point is, other than hatred.
brucehoult Nov 14, 2025

No wireless? Less space than a Nomad? Lame.
That aged well. Six years later it turned into the iPhone.
brucehoult Nov 14, 2025

Blue Origin just launched two 550kg probes to Mars (1.5 AU from the Sun).
SpaceX sent a similar mass Tesla Roadster on a Mars-crossing trajectory in 2018, Psyche to an asteroid at around 3 AU in 2023, and Europa Clipper to Jupiter/Europa (5.2 AU) in 2024.
brucehoult Nov 14, 2025

Blue Origin just launched two 550kg probes to Mars (1.5 AU from the Sun).
SpaceX sent a similar mass Tesla Roadster on a Mars-crossing trajectory in 2018, Psyche to an asteroid at around 3 AU in 2023, and Europa Clipper to Jupiter/Europa (5.2 AU) in 2024.
brucehoult Nov 11, 2025

Gosh. I hope you don't use Facebook, or an iPhone, or any product from a company backed by YC.
"It is not from the benevolence of the butcher, the brewer, or the baker that we expect our dinner, but from their regard to their own interest. We address ourselves, not to their humanity but to their self-love, and never talk to them of our own necessities but of their advantages." -- Adam Smith, An Inquiry into the Nature and Causes of the Wealth of Nations, 1776
brucehoult Nov 10, 2025

Lol. EDS at its finest. I don't like Tesla and its cars any more than you do, but SpaceX and StarLink are amazing.

This user hasn’t submitted anything.