Time synchronization in an heterogeneous environment

Question

In a mixed environment, where machines can be running under Windows (most), Linux (a few), sometimes Android ... what is the best solution to have time synchronization with an accuracy close to milliseconds?

We are developing a microservices based solution, where services are scattered on multiple machines within our setups. There are many situations where consolidating information between them (logs, monitoring, etc) requires a common time base.

Using NTP under Windows seems to have its share of limitations. Any open source solution that could be ran on that operating system? We can't guarantee that there will always be a Linux machine in our setups.

ntp is just made for that and i am not available of any viable alternative. let a lone a cross platform one. ntp should run fine on all types of devices. you don't need to roll your own ntp server, there are public ntp servers that are perfectly fine. if you are worried about time lag to the external servers, dont be. ntp eliminates that by mesuring the round trip time and incorporating that into the mesaurement. — The Shurrican, Jun 21 '15 at 12:00
Note that even with good synchronizartion, the different OSes will treat the leap second coming this month differently, hence there goes your millisecond accuracy ... — Hagen von Eitzen, Jun 21 '15 at 14:08
`what is the best solution to have time synchronization with an accuracy close to milliseconds?` - Do you really, honestly need that degree of accuracy from a technical standpoint? `We are developing a microservices based solution` - What exactly are microservices? — joeqwerty, Jun 21 '15 at 14:56
@joequerty: "Do you really, honestly need that degree of accuracy". Yes, we do. Some of the services are used to control hardware components (networked video servers). Tracking a problem on the network might require to analyse logs / monitored data and correlate them with that granularity. — David Brabant, Jun 21 '15 at 15:14
You are trying to achieve a precision which simply cannot be attained using standard x86/x64 systems. I will expand my answer to explain why later this evening. Stil, you can get ntp to output precision, see: http://www.ntp.org/ntpfaq/NTP-s-sw-clocks-quality.htm and http://nlug.ml1.co.uk/2012/01/ntpq-p-output/831 — ErikE, Jun 21 '15 at 17:53
Meinberg provide a free Windows build of NTP for Windows, with an installer and a status monitor - https://www.meinbergglobal.com/english/sw/ntp.htm . — TessellatingHeckler, Jun 22 '15 at 03:47

ErikE · Accepted Answer · 2019-12-18T12:35:01.300

[EDIT] A major rewrite with references as I just jotted down the old answer from memory.

Short answer: no. It is not possible to get near-millisecond accuracy from a run-of-the-mill operating system on a x86/x64 platform today.

DISCLAIMER This is a laymans answer as I am an ordinary sysadmin with an ordinary sysadmins view of computers. A professional level of knowledge of timekeeping is likely found among some kernel developers and hardware architects.

Long answer:

One has to start somewhere. I'll do this top down, starting with applications moving down towards the oscillator(s).

The first problem is not having timekeeping on one computer, but managing to get the environment as a whole to agree on whatever timekeeping you have. What timekeeping? It turns out there are a couple of ways to keep time in a computer of today. The one we see the most of is the system time (as displayed in one of the screens corners). Let's start by pretending it's that simple and complicate things a couple of paragraphs down.

We want the system time to be correct and we want it to be uniform across all of our computers. We need a way to communicate it from a trusted source at a level so granular so as to meet our requirements whichever they may be.

Let's make our requirement into a tolerance level of 1ms, that is, our time may deviate 1ms within our environment or we miss a critical goal. Let's get concrete and look at what Microsoft can do for us.

Excluding obsoletes such as NT, Windows native runs its timekeeping based on either simplified ntp (domain-joined computers beginning with XP/2003) or simplified sntp (non-domain-joined computers beginning with Win2k) - thanks to @Ryan for nitpicking this detail. Microsoft set two goals when making the timekeeping implementation, neither of which include our desired level of accuracy:

"We do not guarantee and we do not support the accuracy of the W32Time service between nodes on a network. The W32Time service is not a full-featured NTP solution that meets time-sensitive application needs. The W32Time service is primarily designed to do the following:

Make the Kerberos version 5 authentication protocol work.
Provide loose sync time for client computers.

The W32Time service cannot reliably maintain sync time to the range of one to two seconds. Such tolerances are outside the design specification of the W32Time service."

OK. Assuming we are running your service stack on more than one computer and have a timekeeping tolerance level approaching 1ms for event correlation, that is quite a letdown. If the service stack includes two computers, we actually can't use Windows native timekeeping at all. But while we're at it, let's underscore a key point or two about the Windows native timekeeping, and include some thorough documentation:

If you have an AD observe that the time in a given domain will be synchronized from the PDC Emulator role, whichever DC has it. Bringing correct time into the domain thus needs to be via the Domain Controller running the PDC Emulator role. If in a multidomain forest this translates to the PDC Emulator of the forest root domain. From there time is dispersed primarily to the PDC Emulators of subdomains and to each domain member in a fan out fashion (with some caveats). This process is documented here. Even more in depth information here

OK. What can we do?

To begin with, we need one or other more precise way to synchronise time throughout the environment. Assuming we can't run Linux ntpd or ntpd for Windows you could take a look at a shareware client called Tardis, but there are likely many more out there to try.

We ran Tardis on a Win2k3 server running as PDC Emulator which had a CMOS clock with a really large skew, for inexplicable historical reasons we had no choice but to synchronize the entire network from it. Now it has been replaced to great joy with a dedicated Linux ntpd bringing time in from atomic clocks on the outside, but Tardis saved us admirably then and there. I don't know however if it could help you in achieving precision greater than Windows native.

But let's assume from this point on, that we(us) have figured out how to implement a perfect substitute network time synchronisation. Through its inherent craftiness it has a capacity for tolerance levels below one millisecond. We have put it in place so as to enforce how our AD expects time to spread through the network.

Does this mean that we can get accurate diagnostics out of operating systems and microservices at a granularity approaching single milliseconds?

Let's look at how operating systems on the x86/x64 architecture schedule processor time.

They use interrupts, which are multifacetted beasts rich in archaeological substance. However, the operating system is not alone in its desire to interrupt. The hardware wishes to interrupt too, and it has the means to do it! (Hello keyboard) And operating systems play along.

This is where it gets complicated and I will solve this by oversimplifying. Questions? I duck, cover and point you to an absolutely excellent treatise on the subject. (If you're hunting milliseconds on a Windows platform you really should read it..) An updated version for Win8.1/Win2012r2 is reportedly in the works but no release date has yet surfaced.

OK, interrupts. Whenever something should happen in an OS, an interrupt triggers the action which follows. The action is a bunch of instructions fetched from the kernel, which can be executed in a whole lot of different manners. The bottom line is that despite the interrupt happening at a time which can be determined with more or less accuracy depending on hardware architecture and kernel interrupt handling, the exact time at which the subsequent parts of execution happen generally can not. A specific set of instructions may be executed early on after the interrupt or late on, it may be executed in a predictable sequence or not, it may be victim of buggy hardware or poorly written drivers affecting latencies hard to even recognize. Most of the time one simply doesn't know. The millisecond level timestamp that shows in the subsequent log file - it is very precise, but is it accurate as to when the event happened?

Lets stop briefly by the timekeeping interrupt. An interrupt comes with a priority level, the lowest level is where user applications (such as a standard service) get their processor time. The other (higher) levels are reserved for hardware and for kernel work. If an interrupt at a level above the lowest arrives, the system will pretend any lower priority interrupts also in queue don't exist (until higher prio interrupts have been cared for). The ordinary applications and services running will in this way be last in line for processor time. As a contrast, almost highest priority is given to the clock interrupt. The updating of time will just about always get done in a system. This is an almost criminal oversimplification of how it all works, but it servers the purpose of this answer.

Updating time actually consists of two tasks:

Updating the system time / AKA the wall clock / AKA what I say when someone asks me what time it is / AKA the thing ntp fiddles a bit back and forth relative to nearby systems.
Updating the tick count, used for instance when measuring durations in code execution.

But wether it is wall time or tick count, where does the system get the time from? It depends greatly on the hardware architecture. Somewhere in the hardware one or several oscillators are ticking, and that ticking is brought via one of several possible paths into an interface for contact with the kernel as it with greater or lesser precision and accuracy updates its wall time and tick count.

There are several design models for oscillator placement in a multicore system, the major differentiator seems to be synchronous vs asynchronous placement. These together with their respective challenges to accurate timekeeping are described here for instance.

In short, synchronous timekeeping has one reference clock per multicore, which gets its signal distributed to all cores. Asynchronous timekeeping has one oscillator per core. It is worth noting that the latest Intel multicore processors (Haswell) use some form of synchronous design using a serial bus called "QuickPath Interconnect" with "Forwarded Clocking", ref. datasheet. The Forwarded Clocking is described in terms such that a layman (me) can get a quick superficial grasp on it here.

OK, so with all that nerderism out of the way (which served to show that timekeeping is a complex practical task with a lot of living history about it), let's look even closer at interrupt handling.

Operating systems hadle interrupts using one of two distinct strategies: ticking or tickless. Your systems use one or the other, but what do the terms mean?

Ticking kernels send interrupts at fixed intervals. The OS cannot measure time at a finer resolution than the tick interval. Even then, the actual processing involved in performing one or several actions may well contain a delay greater than the tick interval. Consider for instance distributed systems (such as microservices) where delays inherent in inter-service calls could consume relatively a lot of time. Yet every set of instructions will be associated with one or several interrupts measured by the OS at a resolution no finer than the kernel ticking time. The tick time has a base value but can at least in Windows be decreased on demand by an individual application. This is an action associated not only with benefits but also with costs, and carries quite a bit of fine print with it.

So called tickless kernels (which have a very non descriptive name) are a relatively new invention. A tickless kernel sets the tick time at variable intervals (as long duration as possible into the future). The reason is for the OS to dynamically allow processor cores to go into various levels of sleep for as long as possible, with the simple purpose of conserving power. "Various levels" include processing instructions at full speed, processing at decreated rates (i.e. slower processor speed) or not processing at all. Different cores are allowed to operate at different rates and the tickless kernel tries to let processors be as inactive as possible, even in cases including queueing up instructions to fire them off in interrupt batches. In short, different cores in a multiprocessor system are allowed to drift in time relative to each other. This of course plays havoc with good time keeping, and is so far an unsolved problem with newer powersaving processor architectures and the tickless kernels which allow them to do efficient power saving. Compare this with a ticking kernel (static tick interval) which continually wakes all processor cores up, regardless of them receiving actual work or not, and where timekeeping carries a degree of inaccuracy but to a relatively dependable degree compared to tickless kernels.

The standard Windows tick time - that is the system resolution - is 15.6ms up until Windows 8/2012 where the default behaviour is tickless (but is revertable to ticking kernel). Linux default tick time I believe depends on the kernel compilation, but this niche is well outside my experience (and this one too) so you may wish to double check if you depend on it. Linux kernels I believe are compiled tickless from 2.6.21 and may be compiled with various flags optimizing the tickless behaviour (and of which I only recall a few variants of no_hz).

So much for bare metal systems. In virtual systems it gets worse, as VM and hypervisor contention in different ways make accurate timekeeping extremely difficult. Here is an overview for VMware and here is one for RHEL KVM. The same holds true for distributed systems. Cloud systems are even more difficult as we do not get even close to seeing actual hypervisors and hardware.

To conclude, getting accurate time out of a system is a multilayered problem. Going now bottom up from a high-level point of view, we have to solve: Internal time synchronization between the hardware and the kernel, interrupt processing and delays to the execution of the instructions we wish the time of, if in a virtual environment inaccuracies due to the encapsulation of a second OS layer, the synchronization of time between distributed systems.

Therefore at this point in the history of computing we will not get milisecond level accuracy out of an x86/x64 architecture, at least not using any of the run-of-the-mill operating systems.

But how close can we get? I don't know and it ought to vary greatly between different systems. Getting a grip on the inaccuracy in ones own specific systems is a daunting task. One need only look at how Intel suggests code benchmarking should be done to see that ordinary systems, such as the ones I happen to find myself administering, are very much out of control in this perspective.

I don't even comtemplate acheiving "All power optimization, Intel Hyper-Threading technology, frequency scaling and turbo mode functionalities were turned off" in critical systems, much less tinker with code wrappers in C and running long term tests to get subsequent answers. I just try to keep them alive and learn as much as I can about them without disturbing them too much. Thank you timestamp, I know I can't trust you fully but I do know you're not too many seconds off. When actual millisecond accuracy does get important, one measure is not enough, but a greater number of measurements are needed to verify the pattern. What else can we do?

Lastly, it is interesting to look at how the realtime OS-people think interrupt latency. There is also a very exciting time synch alternative in the works, where quite a bit of interesting statistics, methodology and whitepapers are made public. Add future hardware architecture and kernel developments to that and in a few years this timekeeping accuracy thing may no longer be such a problem. One may hope.

Great answer. Just one nitpick. Modern Windows doesn't really use SNTP. Maybe Win 2000/2003/XP did, but not any more. Windows Time is like, 90% NTP. It's uh... almost NTP. I don't know why Microsoft chose to only partially implement NTP. But it's a custom implementation that isn't SNTP but isn't quite NTP either. — Ryan Ries, Jun 28 '15 at 13:31
You mentioned NTP quite a few times, what about [PTP](https://en.wikipedia.org/wiki/Precision_Time_Protocol)? Would that help Linux systems to be better synchronized? That document seems to say that the accuracy can come as close as 1ns... — Alexis Wilke, Sep 24 '15 at 09:18
The accuracy of PTP may be 1ns, but the entire stack from hardware through OS and application needs to operate timekeeping at that level of accuracy in order for you to practically achieve it. Removing one weak link is a step, but neither x86/x64 nor an ordinary linux kernels interrupt handling will get even close AFAIK. You need some kind of real time platform too in order to realize the capability of PTP methinks. — ErikE, Sep 24 '15 at 09:33
I think this doc shows one way how one may approach the problem in practical terms, it is not for the faint of heart and kind of illustrates how difficult it is to achieve what you are asking about using a random ordinary set of business applications: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf — ErikE, Sep 24 '15 at 09:39
Yes, reading the other answer, having a PCI card or something similar that gets GPS time would probably be best for synchronization. Even though it does not solve the minimal variance introduced by the CPU and other hardware parts... — Alexis Wilke, Sep 25 '15 at 21:26

score 1 · Answer 2 · answered Jun 22 '15 at 01:19

1

Natively time.windows.com is used by Microsoft operating systems. If you need something more specific, I would advise using a NIST Internet Time Server. They even run authenticated NTP if you're concerned with tampering. If this still doesn't suffice, you can always run your own. There are a number of vendors that sell stratum 1 or 2 NTP servers which you can just plug into your network. Stratum refers to the different methods used for verifying time. Stratum 1 will use just one method (NTP, CDMA, GPS) whereas stratum 2 will use two methods.

answered Jun 22 '15 at 01:19

user2320464

789
5
14

Would n computers all fit with a GPS PCI card have a real chance to be closely synchronized? Would that be better/closer than using a method such as NTP which relies on the network? – Alexis Wilke Sep 24 '15 at 09:13
Absolutely! However this solution will likely cost more in terms of hardware and a good signal is needed to each system. It is much easier to wire up an antenna(s) to a single device an use NTP as it leverages the existing network. – user2320464 Sep 26 '15 at 17:56

Time synchronization in an heterogeneous environment

2 Answers2

Linked