Preferences

> Stripping information from an identifier disconnects a piece of data from the real world which means we no longer can match them. But such connection is the sole purpose of keeping the data in the first place.

The identifier is still connected to the user's data, just through the appropriate other fields in the table as opposed to embedded into the identifier itself.

> So, what happens next is that the real world tries to adjust and the "data-less" identifier becomes a real world artifact. The situation becomes the same but worse (eg. you don't exist if you don't remember your social security id). In extreme cases people are tattooed with their numbers.

Using a random UUID as primary key does not mean users have to memorize that UUID. In fact in most cases I don't think there's much reason for it to even be exposed to the user at all.

You can still look up their data from their current email or phone number, for instance. Indexes are not limited to the primary key.

> The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

A fully random primary key takes into account that things change - since it's not embedding any real-world information. That said I also don't think there's much issue with embedding creation time in the UUID for performance reasons, as the article is suggesting.


> You can still look up their data from their current email or phone number, for instance. Indexes are not limited to the primary key.

This is the key point, I think. Searching is not the same as identifying.

> Using a random UUID as primary key does not mean users have to memorize that UUID. In fact in most cases I don't think there's much reason for it to even be exposed to the user at all.

So what is such an identifier for? Is it only for some technical purposes (like replication etc.)?

Why bother with UUID at all then for internal identifiers? Sequence number should be enough.

"Internal" is a blurry boundary, though - you pick integer sequence numbers and then years on an API gets bolted on to your purely internal database and now your system is vulnerable to enumeration attacks. Does a vendor system where you reference some of your internal data count as "internal"? Is UID 1 the system user that was originally used to provision the system? Better try and attack that one specifically... the list goes on.

UUIDs or other similarly randomized IDs are useful because they don't include any ordering information or imply anything about significance, which is a very safe default despite the performance hits.

There certainly are reasons to avoid them and the article we're commenting on names some good ones, at scale. But I'd argue that if you have those problems you likely have the resources and experience to mitigate the risks, and that true randomly-derived IDs are a safer default for most new systems if you don't have one of the very specific reasons to avoid them.

> "Internal" is a blurry boundary, though

Not for me :)

"Internal" means "not exposed outside the database" (that includes applications and any other external systems)

Internal means "not exposed outside some boundary". For most people, this boundary encompasses something larger than a single database, and this boundary can change.
UUIDs are good for creating entries concurrently where coordinating between distributed systems may be difficult.

May also be that you don't want to leak information like how many orders are being made, as could be inferred from a `/fetch_order?id=123` API with sequential IDs.

Sequential primary keys are still commonly used though - it's a scenario-dependant trade-off.

If you expose the identifier outside the database, it is no longer "internal".
Given the chain was:

> > Using a random UUID as primary key does not mean users have to memorize that UUID. [...]

> So what is such an identifier for? [...] Why bother with UUID at all then for internal identifiers?

The context, that you're questioning what they're useful for if not for use by the user, suggests that "internal" means the complement. That is, IDs used by your company and software, and maybe even API calls the website makes, but not anything the user has to know.

Otherwise, if "internal" was intended to mean something stricter (only used by a single non-distributed database, not accessed by any applications using the database, and never will be in the future), then my response is just that many IDs are neither internal in this sense nor intended to be memorized/saved by the user.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal