Comment by kleiba - Hacker Neue

Not to appear overly Bayesian here, but shouldn't the prior probabilities of the objects themselves be factored in?

I mean, in most real-life situations you probably won't know ahead of time which objects you're going to have to hash (or else you could hash them perfectly anyway).

kevindamm 1 day ago

In theory, you can treat the objects as i.i.d. giving them a uniform prior.

In practice, of course only the inputs are inputs and if you were to take that into account then you could avoid collisions, but as you say, you can't really know that ahead of time.

Unless you can, such as when there is structure to the inputs or some other source of non-uniformity that you can account for. The topic of learned indexes is worth investigating if you're interested:

https://arxiv.org/abs/1712.01208

https://arxiv.org/abs/2012.12501

This item has no comments currently.