NTP server architecture

Question

I have a virtual environment running several Linux boxes and I'm planning how to manage all of the ntp architecture.

As far as I know, there is no use in having two servers in 'ntp.conf' file, there should be only one or more than three ntp servers pointed by the client so, my first approach is having one server 'server1' pointing to 4 public servers, specifically RHEL ones, then having other box 'server2' pointing to server1 and below all my other Linux servers pointing to server2.

But I have observed weird behaviour with this architecture. I've seen some servers desyncing between server2 and them and even sometimes server1 and server2 aren't perfectly synced.

My first question is, why is that happening?

Then I came up with another architecture which was having the same server1 pointing to public ntp servers and then having three servers, 'server2', 'server3' and 'server4' pointing to server1 and below all my other machines pointing to servers2-4.

Is there a chance this architecture could improve the syncing within all my network?
Or would it be the same performance between syncs? What would be the best way to architecture this?

Edited

Here is the output of ntpq -p from server1:

remote          refid      st t when poll reach   delay   offset  jitter
=========================================================================
*Time100.Stupi. .PPS.       1 u  317 1024  377  182.786    5.327   3.022
LOCAL(0)        .LOCL.     10 l  46h   64    0    0.000    0.000   0.000

And here its ntp.conf:

# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.rhel.pool.ntp.org iburst
server 1.rhel.pool.ntp.org iburst
server 2.rhel.pool.ntp.org iburst
server 3.rhel.pool.ntp.org iburst
#broadcast 192.168.1.255 autokey        # broadcast server
#broadcastclient                        # broadcast client
#broadcast 224.0.1.1 autokey            # multicast server
#multicastclient 224.0.1.1              # multicast client
#manycastserver 239.255.254.254         # manycast server
#manycastclient 239.255.254.254 autokey # manycast client

# Enable public key cryptography.
#crypto

includefile /etc/ntp/crypto/pw

# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys

# Specify the key identifiers which are trusted.
#trustedkey 4 8 42

# Specify the key identifier to use with the ntpdc utility.
#requestkey 8

# Specify the key identifier to use with the ntpq utility.
#controlkey 8

# Enable writing of statistics records.
statistics clockstats cryptostats loopstats peerstats sysstats rawstats

### Added by IPA Installer ###
server 127.127.1.0
fudge 127.127.1.0 stratum 10

Here is the output of three of the clients:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*server1         172.16.29.21     3 u    1   64    1    1.090   -0.138   0.036


     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*server1         172.16.29.21     3 u 1035 1024  377    1.117   -1.943   0.530


     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*server1         172.16.29.21     3 u   32   64    1    0.902    1.788   0.140

How are you determining that the servers are "desyncing" and what's your definition of "perfectly synced"? It might be different to NTP's definition: https://tools.ietf.org/html/rfc5905#section-4 (last paragraph on page 8). — Paul Gear, Aug 29 '17 at 23:23
Because at some point the difference between the clients and the server goes beyond 3 minutes. — Edgar Sampere, Aug 30 '17 at 18:02
Note that NTP in VMs is/was generally not advised: Hypervisors have better ways to synchronize the clocks between host and virt. — Martin Schröder, Aug 30 '17 at 19:25
What is your source on that? AFAIK, NTP is recommended on physical and virtual hosts. "The use of NTP is always recommended on both the RHV host as well as in the RHEL or Windows guest due to timing issues present with all virtual machines." Source: https://access.redhat.com/solutions/27865 — Aaron Copley, Aug 30 '17 at 21:45
I don't feel like this is a RHV specific thing, but I wouldn't mind an education. — Aaron Copley, Aug 30 '17 at 21:53
I got the base script from a Red Hat lab and the architecture was here before I arrived, that's why I'm asking opinions before doing the new one. Which would be a better way to sync between hypervisor and VM? — Edgar Sampere, Aug 30 '17 at 22:31
@MartinSchröder NTP in VMs is definitely both required and workable - any advice to the contrary is out-of-date. I've provided data for KVM on Linux which demonstrates this at https://libertysys.com.au/2016/12/the-school-for-sysadmins-who-cant-timesync-good-and-wanna-learn-to-do-other-stuff-good-too-part-5-myths-misconceptions-and-best-practices/#myth-you-don8217t-need-ntp-in-vms and I've found it to be true with all the major public clouds' hypervisors as well. — Paul Gear, Aug 30 '17 at 23:26
@MartinSchröder Here's an example where having host clock sync on caused major problems: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1676635 and here's the data I gathered for to show it with & without host clock sync: https://people.canonical.com/~paulgear/azure-clock/ — Paul Gear, Aug 30 '17 at 23:30
@EdgarSampere 3 minutes is a very large gap by NTP reckoning, but that's not show in your ntpq output - they are all very close in time. Where are you getting the 3 minute figure from? — Paul Gear, Aug 30 '17 at 23:32
@MartinSchröder Well, that is because I restarted the daemon like two days ago. Two weeks ago I had to centralize the architecture because of that gap but know I believe as here has been demonstrated that centralizing is a bad idea also — Edgar Sampere, Aug 31 '17 at 17:56

score 9 · Accepted Answer · answered Aug 28 '17 at 23:17

9

Depending on how critical time keeping is in your environment, you may not want server1 to be a single point of failure. If you have to take it offline for maintenance or repair for an extended period of time, its peers will stop syncing. It is all downhill from there.

Why not have server1, server2, server3, server4 all sync to 4 or 5 Internet peers. Then, your internal network can reference these systems?

Conventional wisdom says that 3 is what you need for quorum, but you need to be tolerant of at least one being determined as a falseticker or going offline.

Please see; 5.3.3. Upstream Time Server Quantity

Additionally, you mention weirdness and issues with your current configuration. It would help to see the output of ntpq -p for the relevant hosts.

answered Aug 28 '17 at 23:17

Aaron Copley

12,525
5
47
68

Sorry for the delay. I believe the weirdness is related to the directives on the conf file because tunelling to one server is restrinting the proper sync. Let me edit the question with required output of `ntpq -p` – Edgar Sampere Aug 30 '17 at 15:48
After designing the new architecture with all your comments my only question left is...Is it better to have servers [1-4] pointing all to same 4 public servers? Or maybe point every server to different sets of public servers? What would be best? – Edgar Sampere Sep 08 '17 at 14:55
I would just point them to the pool address. Hosts come in and out of the pool regularly. – Aaron Copley Sep 08 '17 at 18:55
But I mean, every server pointing to different pools? Or all 4 to the same pool? – Edgar Sampere Sep 08 '17 at 18:59
1

You should use the broadest or *least specific* pool address that gets you good results. See this for more information: http://www.pool.ntp.org/en/use.html – Aaron Copley Sep 08 '17 at 19:16

Paul Gear · Answer 2 · 2023-04-11T02:24:10.557

6

While it's not strictly true that 2 servers is no use, the NTP Best Current Practices RFC (8633) recommends 4 as a minimum. NTP's intersection algorithm doesn't depend merely on quorum in the number of servers, but also in the quality of the time which they return - and you can't predict that. So the more the better. There is no problem having up to 10 upstream NTP servers.

As Aaron mentioned, your proposed servers 1-4 should all point to upstream NTP servers, and your internal systems should point to all 4 of them. Servers 1-4 can also peer with each other (in symmetric mode), but that's not strictly required.

It's important to understand why you shouldn't funnel NTP through a single server at any point in your architecture: NTP requires multiple servers for accuracy, not just redundancy (see the NTP docs for a description of the algorithms, which explains why). (Shameless plug: I've written more about this elsewhere, including suggestions for architecture.)

edited Apr 11 '23 at 02:24

answered Aug 29 '17 at 23:20

Paul Gear

4,367
19
38

2

Good point regarding quality! :) – Aaron Copley Aug 30 '17 at 02:33
1

Thanks! After reading ntp rfc and several references trying to understand how draftfile and ntp sync works now I believe having more servers is better for accuracy. For servers 1-4, should I remove the nopeer and noquery directives to sync properly? – Edgar Sampere Aug 30 '17 at 15:47
I also was reading that I need to remove the `restrict 127.0.0.1` in my NTP server conf file. Is this true? I mean, from the 4 servers acting as local upstreams in the proposed architecture – Edgar Sampere Aug 30 '17 at 15:50
@EdgarSampere Your current default and 127.0.0.1 restrictions look good and shouldn't be changed. What you should do is add a new one matching your servers something like: "restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap noquery" - nopeer is the only thing that needs to change for servers 1-4. – Paul Gear Aug 30 '17 at 23:43
@PaulGear What would the new restrict use for? I still don't full understand noquery – Edgar Sampere Aug 31 '17 at 18:09
1

@PaulGear Btw, your blog's interesting, its just a little "Advance warning: This is a long post" – Edgar Sampere Aug 31 '17 at 18:17
1

@EdgarSampere 'noquery' stops remote ntpq & such; see http://doc.ntp.org/current-stable/accopt.html for details. – Paul Gear Sep 01 '17 at 06:07

NTP server architecture

2 Answers2