1

I have a set of four ntpd servers that sync time from the same stratum 1 server. But on some clients they all are marked falsetick, why?

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
x10.201.24.36    209.51.161.238   2 u   12   64  377    0.177  464.794   1.136
x10.201.13.99    209.51.161.238   2 u   37   64  377    0.148  463.427   0.541
x10.201.24.37    209.51.161.238   2 u  817 1024  377    0.174  462.235   0.143
x10.201.12.198   209.51.161.238   2 u  853 1024  377    0.158  462.151 302.364
*127.127.1.0     .LOCL.          10 l   48   64  377    0.000    0.000   0.004

They recover from time to time, but why does it happen at all? Also another question is why do I have such a big offset? I tried to leave only 1 ntp server, but offset doesn't go down anyway.

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*10.201.12.198   209.51.161.238   2 u   64   64    7    0.108  470.963   0.200
 127.127.1.0     .LOCL.          10 l  136   64   14    0.000    0.000   0.001

I am runing CentOS 7 and all servers are in the same network.

ntp.conf:

driftfile /var/lib/ntp/drift
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod limited nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict lga-ntp03.pulse.prod
server lga-ntp03.pulse.prod iburst burst
restrict lga-ntp06.pulse.prod
server lga-ntp06.pulse.prod iburst burst
restrict lga-ntp05.pulse.prod
server lga-ntp05.pulse.prod iburst burst
restrict lga-ntp01.pulse.prod
server lga-ntp01.pulse.prod iburst burst
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys

Update #1 After removing LOCL (Thanks to @John Mahowald), I am still getting my servers marked as falsetick:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
x10.201.24.36    209.51.161.238   2 u  886 1024    7    0.177  -330.92   0.172
*10.201.13.99    209.51.161.238   2 u  773 1024   17    0.203  -152.68   0.090
x10.201.24.37    209.51.161.238   2 u  750 1024   17    0.167   94.101   0.468
x10.201.12.198   209.51.161.238   2 u  409 1024   17    0.129   51.831   0.176
poison1456
  • 11
  • 2

2 Answers2

1

Undisciplined Local Clock (refid LOCL) should not be used. Remove it.

The Undisciplined Local Clock should generally no longer be used.

It was originally designed to be used when an ntpd must be able to serve time to others even when no real time sources are reachable.

The Undisciplined Local Clock is not a back-up for leaf-node (i.e. client only) ntpd instance.

In theory, if a NTP server had a good oscillator it can serve tolerably accurate time without any other sources.

It falls down because the clocks on most computers have rubbish accuracy compared to sources including the NTP Pool. However, LOCL has 0 delay, usually low delay means low error. So the likely good NTP sources you want to use are thrown out by the intersection algorithm because none are close to the low delay one. Except the local clock isn't the accurate one...

460 ms offset is a few days of drift on many systems. It could be lower, but selection problems happened when LOCL was introduced. Also, it takes many hours to discipline a large offset without stepping.


After you removed LOCL, you still have outliers by the intersection algorithm.

Despite all using refid clock.nyc.he.net, and all being 0.2 ms away, your NTP servers are still at maximum 400 ms offset from each other. That is terrible performance, many standard deviations given the expected error.

Rolling restart the ntp service on your servers. Or otherwise force a step correction.

Also investigate what your NTP servers are hosted on. They should be lightly loaded, running very little but ntpd (or chronyd). VMs are acceptable, but you need to be sure the physical hosts are syncing to the same time service. As host time sync is a thing.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • Thanks! I removed LOCL but still have my ntp servers marked as falsetick. Please see the original question, I updated it with a new `ntpq -pn` output. – poison1456 Dec 09 '19 at 11:01
  • 1
    You have large offsets between your NTP servers for some reason. See edit. – John Mahowald Dec 09 '19 at 14:59
  • The problem is not caused by `LOCL` being used; it's the offset of the reference times! – U. Windl Dec 30 '19 at 13:54
  • Both LOCL and poorly performing peers were problems. The latter was more obvious when useless LOCL was removed. Although, if merely sub 1 second offsets were the requirement, this could be adequate even if the intersection algorithm didn't like it. – John Mahowald Dec 30 '19 at 16:15
1

As John Mahowald's comment indicated, your remote servers have no agreement between them. Here's a visualisation of the offset and delay values:

Your setup NTP visualisation

The black horizontal bars are the estimated time of the remote server, and they should have some purple-ish bars around them showing the possible ranges of time values which your remote servers could have. (The reason we don't see them in this graph is because you are within very close network proximity of your sources.) These ranges should overlap, but they don't in the case of your sources.

Here's an example visualisation of a working NTP setup:

Working setup NTP visualisation

As you can see, the second example has all of the centre lines close together (and close to zero offset), and a large range of overlap.

So the summary is: your time sources are serving inaccurate time, and NTP rightly doesn't trust them.

Paul Gear
  • 4,367
  • 19
  • 38
  • It could be the time servers configured provide inaccurate time, *but* it can also be that the local machine or the network path (DSL?) is higly loaded causing significant delay. – U. Windl Dec 30 '19 at 13:56
  • Generally speaking, that could be the case, but the delay figures in the hundreds of microseconds in the `ntpq` output demonstrate that's not the problem here. – Paul Gear Jan 02 '20 at 04:35