Comment by Xcelerate - Hacker Neue

Xcelerate 3 days ago parent

Do wide events really have to take up this much space? I mean, observability is to a large degree basically a sampling problem where the goal is to maximize the ability to reconstruct the state of the environment at a given time using a minimal amount of storage. You can accomplish that by either reducing the number of samples taken or by improving your compression capability.

For the latter, I have a very hard time believing we’ve squeezed most of the juice out of compression already. Surely there’s an absolutely massive amount of low-rank structure in all that redundant data. Yeah, I know these companies already use inverted indices and various sorts of trees, but I would have thought there are more research-y approaches (e.g. low rank tensor decomposition) that if we could figure out how to perform them efficiently would blow the existing methods out of the water. But IDK, I’m not in that industry so maybe I’m overlooking something.

behemot 2 days ago

> Do wide events really have to take up this much space?

100PB is the total volume of the raw, uncompressed data for the full retention period (180 days). compression is what makes it cost-efficient. on this dataset, we see ~15x compression, so we only store around 6.5PB at rest.

This item has no comments currently.