Preferences

raphlinus
Joined 13,295 karma
I do research on fundamental UI technology and 2D graphics, with a focus on Rust and fonts. Currently doing open source work in my personal capacity until the end of the year, then in January I start an exciting new role.

@raph@mastodon.online


  1. My reading is that there aren't really a lot of addressing modes on 286, as there are on 68000 and friends, rather every address is generated by summing an optional immediate 8 or 16 bit value and from zero to two registers. There aren't modes where you do one memory fetch, then use that as the base address for a second fetch, which is arguably a vaguely RISC flavored choice. There is a one cycle penalty for summing 3 elements ("based indexed mode").
  2. Memory safety in particular, actually UB in general (got to watch out for integer overflows, among other things). But one could prove arbitrary properties, including lack of panics (would have been helpful for a recent Cloudflare outage), etc.

    In order to prove lack of UB, you have to be able to reason about other things. For example, to safely call qsort, you have to prove that the comparison is a total order. That's not easy, especially if comparing larger and more complicated structures with pointers.

    And of course, proving the lack of pointer aliasing in C is extremely difficult, even more so if pointer arithmetic is employed.

  3. There's a straightforward answer to the "why not" question: because it will result in codebases with the same kind of memory unsafety and vulnerability as existing C code.

    If an LLM is in fact capable of generating code free of memory safety errors, then it's certainly also capable of writing the Rust types that guarantee this and are checkable. We could go even further and have automated generation of proofs, either in C using tools similar to CompCert, or perhaps something like ATS2. The reason we don't do these at scale is that they're tedious and verbose, and that's presumably something AI can solve.

    Similar points were also made in Martin Kleppmann's recent blog post [1].

    [1]: https://martin.kleppmann.com/2025/12/08/ai-formal-verificati...

  4. That's because the 1 instruction variant may read past the end of an array. Let's say s is a single null byte at 0x2000fff, for example (and that memory is only mapped through 0x2001000); the function as written is fine, but the optimized version may page fault.
  5. Unfortunately graphics APIs suck pretty hard when it comes to actually sharing memory between CPU and GPU. A copy is definitely required when using WebGPU, and also on discrete cards (which is what these APIs were originally designed for). It's possible that using native APIs directly would let us avoid copies, but we haven't done that.
  6. It's analogous, but vertex shaders are just triangles, and in 2D graphics you have a lot of other stuff going on.

    The actual process of fine rasterization happens in quads, so there's a simple vertex shader that runs on GPU, sampling from the geometry buffers that are produced on CPU and uploaded.

  7. Thanks for the pointer, we were not actually aware of this, and the claimed benchmark numbers look really impressive.
  8. The output of this renderer is a bitmap, so you have to do an upload to GPU if that's what your environment is. As part of the larger work, we also have Vello Hybrid which does the geometry on CPU but the pixel painting on GPU.

    We have definitely thought about having the CPU renderer while the shaders are being compiled (shader compilation is a problem) but haven't implemented it.

  9. Another deep dive is in https://www.copetti.org/writings/consoles/master-system/

    I've got a mostly-written emulator (in Rust). It's very easy to emulate, possibly the best gameplay bang for the emulator coding effort buck aside from NES. My main intent in writing this emulator is getting it running on an RP2350 board, like Adafruit Fruit Jam or Olimex RP2350pc.

    It should also be possible to get the next generation (SNES, Genesis) on such hardware, but it's a much tighter fit and more effort.

  10. I almost mentioned it in the talk, as an example of a language that's deployed very successfully and expresses parallelism at scale. Ultimately I didn't, as the core of what I'm talking about is control over dynamic allocation and scheduling, and that's not the strength of VHDL.
  11. Right. This is the binary tree version of the algorithm, and is nice and concise, very readable. What would take it to the next level for me is the version in the stack monoid paper, which chunks things up into workgroups. I haven't done benchmarks against the Pareas version (unfortunately it's not that easy), but I would expect the workgroup optimized version to be quite a bit faster.
  12. Yes, sorry about that. We had tech issues, and did the best we could with the audio that was captured.
  13. It's not strictly x86 either, the other case you care about is fp16 support on ARM. But it is included in the M1 target, so really only on other ARM.
  14. I'm extremely curious what those basic methods are. We're in the process of replacing the higher order rootfinding in kurbo with a new solver based on Yuksel's method[1]. If you know of simpler, faster techniques that would be quite interesting.

    [1]: https://crates.io/crates/polycool

  15. I have very high hopes for this board, and have been playing with RP2350 with DVI out for a while (I have one of these on order but it hasn't arrived yet, but other boards[1] exist).

    Emulation is a sweet spot because if you race the beam, there is no compositor latency. Basically every retro computer with less than a quarter meg of VRAM is fair game (whether a framebuffer or not).

    I have a bit of time off this fall and intend to do some fun things.

    [1]: https://github.com/DusterTheFirst/pico-dvi-rs/wiki/RP2350-DV...

  16. There's a new game in town for portable, multiversioned Rust SIMD: fearless_simd. It's still early days (we're gearing up for an 0.2 release soon), but we are using it very successfully to accelerate rendering algorithms in vello_cpu and vello_hybrid. I believe it represents the best compromise on stable Rust today. We're not saying it's ready for production use yet, but I encourage people exploring this space to try it and give us feedback on how well it works.

    There's also a big discussion to be had about how the Rust language might more natively support SIMD. There are some hacks in fearless_simd to work around limitations in the language (especially relying on inlining as load-bearing), and it would be excellent to make that more robust. But the best path forward is not obvious.

    [1] https://github.com/linebender/fearless_simd

  17. Along similar lines but physically much smaller, there are currently about 3 or 4 boards[1] that have RP2350, DVI, USB host, and SD card, ranging in cost from about $15 to $40.

    A particular sweet spot is emulating 8 and 16 bit systems, as latency can be just as good as an FPGA setup. The infoNES emulator has been running on RP2040 for a while, and I see projects for Sega Master System, Genesis, Apple II, and Mac in the works. But you can also write much more powerful software natively.

    Likely it will be possible to adapt software between these various RP2350 systems.

    [1]: https://github.com/DusterTheFirst/pico-dvi-rs/wiki/RP2350-DV...

  18. Yes. The problem here is the -x operation. If INT_MIN is in the array, then the negation operation itself is UB. As you say, the fix is to skip values equal to INT_MIN; it's not possible that its negation is in the array, as that number is not representable.

    Rust is only a little better. With default settings, it will panic if isize::MIN is in the input slice in a debug build, and in a release build will incorrectly return true if there are two such values in the input. But in C you'll get unicorns and rainbows.

  19. Have you attended recently, as in the past few months? Maybe our meeting is special because it's Berkeley, but we have a solid core of young people regularly attending. I was on Nominating Committee last cycle, and we've gotten a number of Young Friends, where in the recent past it's been pretty much aging members.

    You might be right about rebranding, but to me a lot of what appeals is the focus on the substance rather than perceptions.

  20. We're very much thinking along similar lines. I also have an idea for a 4bpp image encoding that could be fast enough for streaming from SD card, and high quality, given preprocessing.
  21. For 640x480 output not overclocked, I estimate tile + sprite CPU utilization to be about 50% of one core. Of course you have two cores. That number goes up and down depending on resolution, particularly when you're pixel doubling.

    It's absolutely doable. There's the beginning of a tile demo (a scrolling brick wall) in the pico-dvi-rs repo.

  22. This looks really cool, I ordered one. I'm also waiting for the Fruit Jam, mentioned elsethread.

    The pico-dvi-rs project has an early prototype of race-the-beam video generation, which I think has a lot of potential, it's going to allow much richer content than a framebuffer on this kind of device. One fun thing we've got going is proportionally spaced bitmap fonts, which is fairly unusual in this form factor. Please get in touch with me if you're interested in driving this thing with Rust.

    [1]: https://github.com/DusterTheFirst/pico-dvi-rs

  23. It's device sRGB for the time being, but more color spaces are planned.

    You are correct that conflation artifacts are a problem and that doing antialiasing in the right color space can improve quality. Long story short, that's future research. There are tradeoffs, one of which is that use of the system compositor is curtailed. Another is that font rendering tends to be weak and spindly compared with doing compositing in a device space.

  24. 1. I like ryg's "A trip through the Graphics Pipeline" [1]. It's from 2011 but holds up pretty well, as the fundamentals haven't changed. The main new topic, perhaps, is the rise of tile based deferred rendering, especially on mobile.

    2. I skipped over this in the interest of time. `Nevermark has the central insight, but the full story is more interesting. For each tile, detect whether the line segment crosses the top edge of the tile, and if so, the direction. This gives you a delta of -1, 0, or +1. Then do a prefix sum of these deltas on the sorted tiles. That gives you the winding number at the top left corner of each tile, which in turn lets you compute the sparse fills and also which side to fill within the tile.

    [1]: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...

  25. Yes, I was born in Enkhuizen.

    The newest spline work (hyperbezier) is still on the back burner, as I'm refining it. This turns out to be quite difficult, but I'm hopeful it will turn out better than the previous prototype you saw.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal