25

The chrony documentation warns

BE WARNED: Certain software will be seriously affected by such jumps in the system time. (That is the reason why chronyd uses slewing normally.) Documentation

But the documentation gives no examples. What are examples of software that will be seriously affected? Is the OS or any background processes at risk?

Cecilia
  • 405
  • 4
  • 8
  • 8
    The downside of providing examples is that some folk will think you've given them a complete list... – John Gordon Feb 28 '23 at 04:09
  • Anything that calculates an average over a fixed amount of time and records it. – Simon Richter Mar 01 '23 at 17:18
  • 1
    Are you considering mistakes causes by people using the software? For example, in some places when switching daylight savings time, trains are scheduled to hold/do not depart during the one hour which could be ambiguous. – Quora Feans Mar 01 '23 at 18:57
  • Pretty much anything that uses certificates. – bviktor Mar 02 '23 at 15:57
  • @QuoraFeans No, I'm just looking for software issues. – Cecilia Mar 02 '23 at 17:29
  • @JohnGordon I understand that a list will not be comprehensive. I just want enough understanding to weigh the risks of using a time jump to synchronize the clocks. – Cecilia Mar 02 '23 at 17:31
  • I'm always surprised when my screen locks and goes dark. – stark Mar 02 '23 at 23:52

10 Answers10

32

This is a bit of open question but let me give some examples:

  • databases - most of them rely a lot of precise time for storing records, indexes, etc
  • security - precise time is very important for security to map action to time and gaps or time duplication is not accepted
  • digital signing - usually part of signed document is the timestamp so wrong time may invalidate the signature
  • scheduling software - may skip or repeat twice jobs depend of time jump direction.
  • clustering software - probably any cluster will need to be in sync and any jump of one or more nodes may have unpredictable result.
Romeo Ninov
  • 5,263
  • 4
  • 20
  • 26
  • 11
    Task scheduling software like cron and [xcron](https://github.com/cubiclesoft/xcron) can be affected by wild swings in system time. Regular cron is especially susceptible since it can skip running scripts or run scripts multiple times depending on the direction that time moves in while several cron replacements keep track of schedules that have run and which have been missed. – CubicleSoft Feb 27 '23 at 15:08
  • 1
    @CubicleSoft, correct, thank you. Will add it in my answer :) – Romeo Ninov Feb 27 '23 at 15:32
  • 3
    Also, anything that's doing cluster coordination. zookeeper, consul, etcd... this makes _clustered databases_ particularly sensitive; as an extreme example, Google's Spanner was designed assuming atomic clocks (or GPS equivalent) at every location. – Charles Duffy Feb 28 '23 at 00:11
  • 1
    @CharlesDuffy, correct, thank you. Will add it too :) – Romeo Ninov Feb 28 '23 at 05:19
  • Monitoring software - may detect that a service did not respond after a timeout, although there was no timeout in reality. – rexkogitans Mar 01 '23 at 08:33
  • @rexkogitans, you are right. But IMHO the end result can be one more ticket created (eventually) by monitoring software and not serious service affect, – Romeo Ninov Mar 01 '23 at 08:38
  • @RomeoNinov I am thinking of software like Monit which may be configured to restart non-responding services. – rexkogitans Mar 01 '23 at 14:07
  • @rexkogitans, such services are questionable... I did not saw big corporations to use them. If they need continuous service they use clusters or bunch of servers behind loadbalancers. – Romeo Ninov Mar 01 '23 at 14:27
  • 1
    I think that a lot of clustering software use system-independent clock like Lamport's clock to synchronise inner workings instead of relying on untrusted OS source. – Hauleth Mar 02 '23 at 13:42
  • Authentication (separate from security/logging) is regularly limited by time too. Auth tokens, certificates, sessions, etc. A common one is kerberos, which will fail for >5 minute gaps by default, and Radius can default even lower – Cpt.Whale Mar 09 '23 at 20:09
  • @Cpt.Whale, see second point :) – Romeo Ninov Mar 09 '23 at 20:22
13

I recently got bit by a bug that dates back to 1999 and affects both the JVM and Android Runtime: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4290274

... two extra executions are fired (unexpectedly) when the system clock is set ahead one minute after the task is scheduled using scheduleAtFixedRate().

I work on a device that starts with the 1970 epoch as the current time, then receives the correct network time a little later. Occasionally a 3rd party library would initialize before the time was set, causing it to experience a 50 year time jump.

The result was scheduleAtFixedRate attempting to catch up on ~50 years worth of invocations... which was about 27 million back-to-back invocations with no delay between them.

That would cause the GC to go haywire and generally bog down the system until it was restarted

9

All software that interacts with real-live hardware. If you have a toaster that toasts bread for 20 seconds, and its software is stupid enough to check against the wall clock, you'll either get white or burned bread if you correct the clock while waiting for your toast.

Practically all applications that control any kind of industrial device need precise timings, like, for example, "open a valve for 5.3 seconds to get the correct amount of fluid". Being off by more than a few milliseconds ruins your product.

Applications that position anything using motors will either use step motors (which are slow) or end switches to determine when to stop. But often, you don't have a switch at every important position, so you'll do some "x m/s for A milliseconds, then y m/s for B milliseconds" logic. Now imagine your NTP daemon adjusts the time by even a single millisecond while this logic is running ...

Guntram Blohm
  • 569
  • 2
  • 7
  • 3
    If stuff is millisecond-precise then you want a time jump. Slewing is going to make the clock inaccurate over a long period of time. A time jump will only screw you up once. – user71659 Feb 28 '23 at 21:28
  • 3
    All such software *should* be programmed to use the monotonic clock which does not jump. That doesn't mean it is. – user253751 Mar 01 '23 at 18:03
  • 3
    @user253751: A monotonic clock is allowed to jump (but only forward).; you are wanting the steady clock – Ben Voigt Mar 01 '23 at 23:51
  • @BenVoigt operating systems I'm familiar with call the clock which does not jump at all "monotonic" – user253751 Mar 02 '23 at 11:49
4

We had an issue with an on-vehicle embedded system where the clock would significant lose time (due to an electrical problem). But the wireless connections were intermittent, so the time only occasionally corrected. The upshot was that when the vehicles finally received wireless, and then an NTP update, the clock would jump forward significantly.

Various systems were checking the "last valid" time of certain things like GPS readings, etc. Suddenly all of these were "old", despite being updated only 0.5 seconds before.

Obviously a reconfiguration could fix the issue, but it was an issue.

TRiG
  • 1,181
  • 3
  • 13
  • 30
Kingsley
  • 141
  • 2
  • 1
    My personal view is you had a not well designed system. Because GPS satellites for example send precise time and can be used for time sync. And this time source is widely used for systems w/o network connectivity. – Romeo Ninov Mar 02 '23 at 08:30
  • 1
    @RomeoNinov - Well the core tenant of engineering is that it's a bunch of trade-offs. We traded using precise GPS time with having time when there was no GPS - like when the vehicle has just started, in a workshop, or underground. Your system can't just say "hang on a minute" when the customer drives off without a GPS fix. When there's a good GPS signal, we can (and do) use it to synch the clock. But it's simply not always available. (And before someone else mentions it, GPS repeaters are not accurate enough for our hardware). – Kingsley Mar 02 '23 at 09:12
  • Systems should use monotonic clock instead of wall clock time for operations that do not require wall clock time. Also it is advisable to have your own NTP server running that collects time from multiple sources - GPS and external NTPs. But I can see why an embedded system may choose to keep things simpler. edit: now I see that in alfgaar's answer. – akostadinov Mar 02 '23 at 14:13
2

Dovecot IMAP server is affected and (in older versions) it (deliberately) suicides if it detects the system time having jumped backwards. In v2.0, it at least tries to remedy the situation.

See https://wiki.dovecot.org/TimeMovedBackwards

2

Plenty of examples...

filo
  • 411
  • 2
  • 7
1

It's already in a comment, but I thought I'd post it as an answer too:

Applications that should have used the steady monotonic clock but don't are also affected. For example, if software checks client keep-alives using the current time, a jump in time may kick out all clients.

I've seen regularly that software uses the wrong clock.

Halfgaar
  • 8,084
  • 6
  • 45
  • 86
  • 1
    This is my preferred technique when writing embedded software. We keep track of elapsed time from power on/reset, and use that in calculations that measure elapsed time. For "wall clock" or current time/date, you maintain a "skew" value that you can add to your elapsed time value. That skew will adjust up/down by a few seconds when you sync with GPS or ntp, but you're typically only using it in your UI or to generate headers. – tomlogic Mar 10 '23 at 17:37
1

Most game engines use an update loop that take a delta of the time between the previous and current time. Sometimes a time change or program suspension/resume will cause this delta to be huge. Typically you just filter out large deltas as an outlier.

0

Everyday normal web browsing

Really.

Anything to do with encryption deals in certificates. The certificates must be validated before they are accepted. Part of the validation process is checking the certificate is not expired, which obviously implicates your computer clock. If your let your computer clock get too far out of sync with reality, certificate validation on the computer will fail.

This matters, because pretty much every web page you access these days is transmitted via HTTPS, which uses TLS encryption (and certificate validation) to ensure the integrity of the page contents.

In other words, if you let your clock get off, you might not be able to even browse the web normally.

Now playing with an NTP daemon — where the whole point is keeping your system clock more or less accurate — is unlikely to create a shift large enough to matter. But point it at the wrong time source, and you could easily create this effect.

Additionally, a number of things that deal in authentication rely on the clocks between the user's computer and the server being relatively in sync, with tolerances limited to sometimes no more than a few minutes difference.

Joel Coel
  • 12,932
  • 14
  • 62
  • 100
  • Chromium has a "sane time" system to detect this problem: https://www.chromium.org/developers/design-documents/sane-time/ – Lambda Fairy Mar 09 '23 at 01:58
0

Timing Things

It appears to be the obvious, but according to Falsehoods Programmers Believe About Time, due to lack of knowledge or support for a monotonically time source, developers often use system time to measure how long a process takes, which can account for incorrect measures, if between the two measurements the system clock has changed, like a value which is:

  • slightly bigger than what was the correct;
  • negative (which can likely crash system rellying in a positive value);
  • a hugely bigger than what is correct due to the integer signal bit flip for negative numbers and a incorrect type conversion (a signed -1 has the same memory representation as the maximum unsigned integer value 0xFFFFFFFF)
// this C code prints 4294967295
printf("%u", (unsigned int) -1);

// this comparison is true
if (0xFFFFFFFF == (unsigned int) -1) {
}

Cache Invalidation

As a wise man once said:

There are only two hard things in Computer Science: cache invalidation and naming things. Phil Karlton

Often distributed systems, use a very short time for cache invalidation, which can be a conservative value, like 60s or higher (which most of dynamic DNS uses), down to a few milliseconds.

Using NTP you can ensure that all computers in a local network are synced down bellow a millisecond or a few milliseconds over internet regarding the correct UTC time.

With that in mind, even an submillisecond call to a cache server in the local network (like Redis) can also be cached in local memory for nanosecond response time.

However, there is a thing called Leap Second which makes this kind of aggressive millisecond caching very hard, as the reference clock can either jump one second ahead or behind the current clock.

The difference between 1s or -1s could mean that the current value that we thing is correct is not the more recent value or a value that is still correct are treated as if it is already too old, causing the system to query the source of true too often, slowing down the system or even crashing it.