Preferences

preseinger parent
node A is connected to nodes B, C, D, E, F

the A->B link is under DDoS or whatever and delivers packets with 10s latency

the A->C link is faulty and has 50% packet loss

the A->{D,E,F} links are perfectly healthy

node B has one view of A's skew which is pretty bad, node C has a different view which is also pretty bad for different reasons, and nodes D E and F have a totally different view which is basically perfect

you literally cannot "detect skew" in a way that's reliable and actionable

issues are not a function of the node, they're a function of everything between the node and the observer, and are different for each observer

even if clocks were perfectly accurate, there is no such thing as a single consistent time across a distributed system. two events arriving at two nodes at precisely the same moment require some amount of time to be communicated to other nodes in the system, that time is a function of the speed of light, the "light cone" defines a physical limit to the propagation of information


withinboredom
I think you’re missing the forest for the trees a bit. In the code, order mostly doesn’t matter and where it does matter there is a monotonic clock decoupled from physical time, using an epoch framework (https://tli2.github.io/assets/pdf/epochs.pdf).

The clock sync is just to keep human-readable logs in order for debugging. It’s ok if it is sometimes out of order, though in practice, it never is.

preseinger OP
i'm not sure we're talking about the same thing

This item has no comments currently.