Comment by tptacek - Hacker Neue

tptacek Apr 1, 2022 parent

I think it's worth repeating this:

While there are platforms with better and worse hardware sources of unpredictable bits, the problem with Linux /dev/random isn't so much a hardware issue, but rather a software one. The fundamental problem Linux tries to solve isn't that hard, as you can see from the fact that so many other popular platforms running on similar hardware have neatly solved it.

The problem with the LRNG is that it's been architecturally incoherent for a very long time (entropy estimation, urandom vs random, lack of clarity about seeding and initialization status, behavior doing bootup). As a result, an ecosystem of software has grown roots around the design (and bugs) of the current LRNG. Major changes to the behavior of the LRNG breaks "bug compatibility", and, because the LRNG is one of the core cryptographic facilities in the kernel, this is an instance where you really really don't want to break userland.

The basic fact of kernel random number generation is this: once you've properly seeded an RNG, your acute "entropy gathering" problem is over. Continuous access to high volumes of high-entropy bits are nice to have, but the kernel gains its ability to satisfy gigabytes of requests for random numbers from the same source that modern cryptography gains its ability to satisfy gigabytes of requests for ciphertext with a 128 bit key.

People looking to platform hardware (or who fixate on the intricacies of threading the LRNG isn't guest VMs) are mostly looking in the wrong place for problems to solve. The big issue today is that the LRNG is still pretty incoherent, but nobody really knows what would break if it was designed more carefully.

octoberfranklin Apr 1, 2022

The piece I've been missing in this whole debate: why isn't the existing RNG simply frozen in its current bug-exact-behavior state and a new /dev/sane_random created?

Stuff that depends on the existing bugs in order to function can keep functioning. Everything else can move to something sane.

Obviously I'm missing something here.

tptacek OP Apr 1, 2022

Because /dev/sane_random or sane_random(2) has better security properties than what we have now, and you want the whole gamut of Linux software to benefit from that; just as importantly, you don't want /dev/urandom and getrandom(2) to fall into disrepair as attention shifts to the new interface, for the same reason that you care very much about UAF vulnerabilities in crappy old kernel facilities most people don't build new stuff on anymore.

Also, just, it seems unlikely that the kernel project is going to agree to run two entire unrelated CSPRNG subsystems at the same time! The current LRNG is kind of an incoherent mess root and branch; it's not just a matter of slapping a better character device and system call on top of it.

burnished Apr 1, 2022

Because you answered their question, I'm hoping you can answer my question.

How is there any overlap in the devices that can't have something clever figured out and devices that could possibly see an update to their kernel code?

zamadatix Apr 2, 2022

Kernel side something clever almost certainly will be figured out eventually just not in time for the 5.18 release (or probably following release either realistically). User space side it doesn't matter if there is an absolutely trivial clever fix available you can't just break it without extremely good reason.

Note: Extremely good reason for breaking userspace is along the lines of "/dev/random has been found to be insecure causing mass security mayhem" not "man I'd really like to ignore the 0.01% of users this would cause an issue for so I can get my patch in faster".

thanatos519 Apr 1, 2022

cough mysql_real_escape_string cough

hsbauauvhabzb Apr 1, 2022

Windows APIs from what I hear share a similar issue to /dev/random (apps rely on bugs in APIs). Maybe the problem is the lack of forward thinking to fix issues.

orra Apr 1, 2022

> Obviously I'm missing something here

For a start, there's a long tail of migrating all useful software to /dev/sane_random. Moreover, there's a risk new software accidentally uses the old broken /dev/random.

Besides, /dev/sane_random essentially exists; it's just a sysctl called getrandom().

hansel_der Apr 2, 2022

> Moreover, there's a risk new software accidentally uses the old broken /dev/random.

but that risk is at 100% NOW, how is it not worth reducing it?

orra Apr 2, 2022

I fully agree.

tptacek OP Apr 1, 2022

It's not that simple; Donenfeld wants to replace the whole LRNG with a new engine that uses simpler, more modern, and more secure/easier-to-analyze cryptography, and one of the roadblocks there is that swapping out the engine risks breaking bugs that userland relies on.

vilhelm_s Apr 1, 2022

What kind of bugs are visible to userland? I would have thought a random number device would be the least likely thing to have upgrade problems like that: applications should not be able to assume anything at all since the output is literally random...

westurner Apr 1, 2022

Shouldn't it be easier than a kernel parameter to compare the performance of specific applications that relied upon the current behaviors; at least for a major rev or two?

This item has no comments currently.