Preferences

rienbdj parent
How many values can a UUID v4 take?

How many do you have to generate before a collision becomes a remote possibility?


NickHoff
A UUID v4 is a 128 bit number, but 4 bits are reserved to specify the version number and 2 more bits are reserved to specify the variant, which leaves 122 bits of randomness. That means it can take on 5 x 10^36 possible values. Following the birthday math, you'd have to generate about 103 trillion UUIDs to have a one-in-a-million chance of having a collision.
jandrewrogers
UUIDv4 has 2^122 values.

The heuristic commonly used for things that matter (e.g. high-security/high-assurance systems) is that the probability of collision should be less than 2^-32 assuming uniform distribution[0]. From this you can compute that the largest set of keys that can be used with UUIDv4 that satisfies this constraint is roughly 100 trillion.

This is a pretty high limit that will work for most applications. Some large data models can exceed this number of records, so you can't use probabilistic UUID naively in these cases e.g. one for every unique record. In data models that approach the 100T limit, UUID-like identifiers are typically generated deterministically to avoid this issue entirely.

[0] There have been many cases of UUIDv4 systems breaking in the wild because people naively or accidentally use weak entropy sources. This turns out to be a recurring footgun such that use of UUIDv4 is prohibited in some applications because you can't rely on people to implement it properly.

toast0
The question becomes, how bad is your random.

If your random is not uniformly distributed, you might get duplication from bias.

If your random is setup wrong and you get the same seeding multiple times, you might get duplication from that.

If your random is good, the birthday math should hold.

ethan_smith
UUID v4 has 122 random bits giving 2^122 possible values (~5.3×10^36). Using the birthday paradox, you'd need to generate about 2^61 UUIDs (~2.3×10^18) for a 50% collision probability, which is well beyond any practical system's capacity.
jandrewrogers
> ...which is well beyond any practical system's capacity.

Well beyond a single server but not single systems. Large analytical data models run into the tens of exabytes in practical systems already. It isn't hypothetical and probabilistic identifiers become inadvisable[0] in those systems.

Not everything is a web app and UUIDv4 does not scale infinitely.

[0] Technically you can use probabilistic identifiers if you widen them beyond 128-bits, but at that scale the compactness of unique identifiers has a large impact on performance and cost, so 128-bit deterministic identifiers are preferable.

masklinn
A uuid has 122 bits of payload.

Depends what you consider “a remote possibility” to be (the birthday attack wiki page has a table for various p and powers of 2)

This item has no comments currently.