For a small web app, fine, but if you're running enterprise level software processing billions of DB transactions per day, clocks just don't cut it.
Race conditions are mitigated, not by clocks, but by other logics. The clock was just something done after frustrations in reading distributed logs and seeing them out of order. Logs are basically never out of order any more and there is sanity.
Not everyone has the luxury of being able to procure and install hardware and/or run an antenna to someplace with gpc reception.
ntp can fail, chrony can fail, system clocks can always drift undetectably
you can treat the system clock as an optimistic guess at the time, but it's never a reliable way to order anything across different machines
node clocks are unreliable by definition, it's a fundamental invariant of distributed systems
Node clocks can be plenty reliable, but like any other hardware, sometimes they get defects.
the A->B link is under DDoS or whatever and delivers packets with 10s latency
the A->C link is faulty and has 50% packet loss
the A->{D,E,F} links are perfectly healthy
node B has one view of A's skew which is pretty bad, node C has a different view which is also pretty bad for different reasons, and nodes D E and F have a totally different view which is basically perfect
you literally cannot "detect skew" in a way that's reliable and actionable
issues are not a function of the node, they're a function of everything between the node and the observer, and are different for each observer
even if clocks were perfectly accurate, there is no such thing as a single consistent time across a distributed system. two events arriving at two nodes at precisely the same moment require some amount of time to be communicated to other nodes in the system, that time is a function of the speed of light, the "light cone" defines a physical limit to the propagation of information
Seems like poor engineering practice.
logical causality does not represent poor engineering practice :)
if you stick to a single source of truth - only one machine's time is used as a source of truth - then the problem disappears.
for example instead of using java/your-language's time() function (which could be out of sync across different app nodes) just use database's internal CURRENT_TIMESTAMP() when writing to db.
another alternative is compare timestamps with up to 1 minute/hour precision, if you carry over time from one machine to another. That way you have a a buffer of time for different machines to synchronize clocks over NTP
There are ways around this but they are restrictive or come at the cost of increased latency. Sometimes those are acceptable trade offs and sometimes they are not.
if you use a single source of truth for clocks (simplest example is use RDBMS's current_timestamp() instead of your programming language's time() function), and the problem disappears
Now two operations come, one adding $300, other one withdrawing $400. What the result would be, depending on thd order of operations?
That's why Google built True Time, which provides physical time guarantee of [min_real_timestamp, max_real_timestamp] for each timestamp instant. You can easily know the ordering of 2 events by comparing the bounds of their timestamps as long as the bounds do not overlap. In order to achieve that, Google try to keep the bound as small as possible, using the most accurate clocks they can find: atomic and GPS clocks.
Sentences like this will make me never regret to moving my infrastructure to bare-metal. My clocks are synchronized down to several nano-seconds, with leap-second skew and all kinds of shiny things. It literally took a day to set up and a blessing from an ISP in the same datacenter to use their clock sources (GPS + PTP). All the other servers are synchronized to that one via Chrony.