> I haven’t read the CDC paper but I’m guessing they just use some probabilistic hash function to define certain strings as block boundaries.
You choose a number of bits (say, 12) and then evenly distribute these in a 48-bit mask; if the hash at any point has all these bits on, that defines a boundary.
You choose a number of bits (say, 12) and then evenly distribute these in a 48-bit mask; if the hash at any point has all these bits on, that defines a boundary.