Comment by withinboredom

withinboredom Jun 27, 2023 parent

> The main reason we can not use system clocks is that system clocks across servers are not guaranteed to be synchronized.

Sentences like this will make me never regret to moving my infrastructure to bare-metal. My clocks are synchronized down to several nano-seconds, with leap-second skew and all kinds of shiny things. It literally took a day to set up and a blessing from an ISP in the same datacenter to use their clock sources (GPS + PTP). All the other servers are synchronized to that one via Chrony.

LennyWhiteJr Jun 28, 2023

Even if are "down to several nano-seconds", a slight clock drift can be the different between corrupt data or not, and when running at scale, it's only a matter of time before you start running into race conditions.

For a small web app, fine, but if you're running enterprise level software processing billions of DB transactions per day, clocks just don't cut it.

withinboredom OP Jun 28, 2023

That’s why you buy NICs with hardware timestamp support and enable PTP. You can detect clock drift within a few packets.

Race conditions are mitigated, not by clocks, but by other logics. The clock was just something done after frustrations in reading distributed logs and seeing them out of order. Logs are basically never out of order any more and there is sanity.

antonvs Jun 27, 2023

It’s not just about bare metal or not. The sort of distributed systems these patterns apply to are not small local clusters in a single datacenter.

codemac Jun 27, 2023

And the really big ones do much more epic clock management, and keep things wildly in sync across the globe. See the recent work with PTP that fb and others are trying to do openly. Google and others have their own internal implementations.

withinboredom OP Jun 27, 2023

In the cloud, you have very little control over the clocks. If you have baremetal, there's almost always the ability to configure (at least) GPS time sync for a server or two. If you get to the point where you have entire datacenters, there's no excuse NOT to invest in getting good clocks -- and is likely a requirement.

bastawhiz Jun 27, 2023

"Not bare metal" does not imply "the cloud". You might be part of a company that doesn't give teams access to raw servers, you might be paying for VMs with dedicated resources from a local data center, or any number of other situations.

Not everyone has the luxury of being able to procure and install hardware and/or run an antenna to someplace with gpc reception.

withinboredom OP Jun 27, 2023

I’d call that a “private cloud” and I think that’s the actual term for it. You can run a private cloud on bare metal, and if you are, then you can (as in physically able to) have control over things. If you have network cards with timestamp support, you can use PTP, or any number of things. If your org doesn’t support that, that doesn’t mean it isn’t a possibility, it just means you need to find someone else to ask.

preseinger Jun 28, 2023

bare metal doesn't solve this problem, clock synchronization works until it doesn't

ntp can fail, chrony can fail, system clocks can always drift undetectably

you can treat the system clock as an optimistic guess at the time, but it's never a reliable way to order anything across different machines

withinboredom OP Jun 28, 2023

That’s why you buy NICs with hardware timestamp support and enable PTP.

preseinger Jun 28, 2023

this doesn't magically fix the problem

node clocks are unreliable by definition, it's a fundamental invariant of distributed systems

withinboredom OP Jun 28, 2023

No, but you can detect skew in just a few packets and decide if you want to drain the node. If a node continues to have issues, put that thing on eBay and get another. Or, send back the motherboard to the manufacturer if it’s new enough.

Node clocks can be plenty reliable, but like any other hardware, sometimes they get defects.

preseinger Jun 28, 2023

node A is connected to nodes B, C, D, E, F

the A->B link is under DDoS or whatever and delivers packets with 10s latency

the A->C link is faulty and has 50% packet loss

the A->{D,E,F} links are perfectly healthy

node B has one view of A's skew which is pretty bad, node C has a different view which is also pretty bad for different reasons, and nodes D E and F have a totally different view which is basically perfect

you literally cannot "detect skew" in a way that's reliable and actionable

issues are not a function of the node, they're a function of everything between the node and the observer, and are different for each observer

even if clocks were perfectly accurate, there is no such thing as a single consistent time across a distributed system. two events arriving at two nodes at precisely the same moment require some amount of time to be communicated to other nodes in the system, that time is a function of the speed of light, the "light cone" defines a physical limit to the propagation of information

2 More Comments →

flaminHotSpeedo Jun 28, 2023

It's easy to say you've solved distributed computing problems when you're not actually doing distributed computing

forkbomb123 Jun 27, 2023

The reason services like AWS or Azure are distributed is so that your resources are not concentrated which helps with fault tolerance. If your datacenter goes down, your whole service goes down. Also as was mentioned in other comments, asynchrony also applies in the same datacenter.

withinboredom OP Jun 27, 2023

That’s what insurance is for. Far less expensive than maintaining fault tolerance at the current scale. If there were a fire or something (like Google’s recent explosion in France from an overflowing toilet), we’d lose at least several minutes of data, and be able to boot up in a cloud with degraded capabilities within ten minutes or so. Not too worried about it.

plandis Jun 28, 2023

Not sure what you’re doing but it sounds like a great example for your competitors to point to and tell customers that they can avoid these issues.

withinboredom OP Jun 28, 2023

It’s a project for fun (a hobby), and given away for free. 10 minutes of downtime every 20-30 years is perfectly acceptable to me.

blackoil Jun 28, 2023

You can do same in the AWS by using TimeSync and limit yourself to one AZ.

slt2021 Jun 27, 2023

I never understood why "The main reason we can not use system clocks is that system clocks across servers are not guaranteed to be synchronized." is considered True even with working NTP synchronization?

withinboredom OP Jun 27, 2023

A few milliseconds difference can mean all the difference in the world at high enough throughput (which is about the best you can get with NTP). When you can control the networking cards and time sources, you can get it within a few nanoseconds across an entire datacenter, with monitoring to drain the node if clock skew gets too high.

slt2021 Jun 28, 2023

And why applications are so sensitive for such small difference in time?

Seems like poor engineering practice.

preseinger Jun 28, 2023

usual problem is when you try to model logical causality (a before b) with physical time (a.timestamp < b.timestamp)

logical causality does not represent poor engineering practice :)

slt2021 Jun 28, 2023

this only applies if you carry over physical time from one machine to another, assuming perfect physical synchronization of time.

if you stick to a single source of truth - only one machine's time is used as a source of truth - then the problem disappears.

for example instead of using java/your-language's time() function (which could be out of sync across different app nodes) just use database's internal CURRENT_TIMESTAMP() when writing to db.

another alternative is compare timestamps with up to 1 minute/hour precision, if you carry over time from one machine to another. That way you have a a buffer of time for different machines to synchronize clocks over NTP

preseinger Jul 4, 2023

if you can delegate synchronization/ordering to the monotonic clock of a single machine, then you should definitely do that :)

but that's a sort of trivial base case -- the interesting bit is when you can't make that kind of simplifying assumption

plandis Jun 28, 2023

You can rephrase this question in terms of causality. Why does it matter that we know if some process happens before some other process at some defined(?) level of precision.

There are ways around this but they are restrictive or come at the cost of increased latency. Sometimes those are acceptable trade offs and sometimes they are not.

slt2021 Jun 28, 2023

root of problem is using clocks from different hosts (which could be out of sync), and carrying over that time from one machine to another - essentially assuming clocks across different machines are perfectly synchronized 100% of time.

if you use a single source of truth for clocks (simplest example is use RDBMS's current_timestamp() instead of your programming language's time() function), and the problem disappears

justsomehnguy Jun 28, 2023

Imagine you have an account holding $200.

Now two operations come, one adding $300, other one withdrawing $400. What the result would be, depending on thd order of operations?

shsbdncudx Jun 28, 2023

It is, agree. Imho in most cases the right answer is to build it to not require that kind of clock synchronisation

dikei Jun 28, 2023

You can build system that do not require physical clock synchronization, but using physical clock often lead to simpler code and major performance advantage.

That's why Google built True Time, which provides physical time guarantee of [min_real_timestamp, max_real_timestamp] for each timestamp instant. You can easily know the ordering of 2 events by comparing the bounds of their timestamps as long as the bounds do not overlap. In order to achieve that, Google try to keep the bound as small as possible, using the most accurate clocks they can find: atomic and GPS clocks.

plandis Jun 28, 2023

Yes, that is essentially the point of logical clocks :)

This item has no comments currently.