I've got a remote unmanned server that's been exhibiting some extremely strange clock/NTP behavior lately. Symptoms:
- Very high jitter
- ntpq -pn returns:
- A resetting 'when' count of back to 1, even though the NTP server is literally 1m of CAT5e away and directly connected to the machine in question. No signs of packet loss or other comms breakdown.
- Frequently, a refid of 'LOCAL(0)' even though I know the NTP server in question is having no issues reaching its stratum 2 server.
admin@machine:~$ date && ntpq -pn Thu 24 May 19:34:02 UTC 2018 remote refid st t when poll reach delay offset jitter ============================================================================== <local_ntpserver> LOCAL(0) 15 u 120 128 377 0.120 -486.68 909.283 admin@machine:~$ date && ntpq -pn Thu 24 May 19:38:37 UTC 2018 remote refid st t when poll reach delay offset jitter ============================================================================== <local_ntpserver> <remote_ntpserver> 3 u 1 128 377 0.123 -1854.0 2164.83
From the local NTP server (i.e. the machine running at the same physical location):
remote refid st t when poll reach delay offset jitter
==============================================================================
<remote_ntpserver> <remote_ntpserver2> 2 u 49 64 377 5076.18 1546.21 299.468
You can see that this local NTP server has good reach and relatively low jitter, despite being across a high-latency wireless network.
I've modified minpoll and maxpoll to low values (4, 5) on the primary machine so that ntp is running more frequently and this "bandaid" solution seems to be keeping the primary machine somewhat tethered to reality (unlike before where it was drifting minutes away multiple times a day), but I'd like to get to the root of this weird behavior.
I have a theory that the tsc clock could be drifting wildly, but I have no evidence of this. It would explain the high jitter though, and this in turn could maybe introduce some weird behavior in NTP.
Regardless, I don't understand why the refid keeps reverting to 'LOCAL (0)' when this clearly isn't the case. The NTP service is not restarting. For example:
● ntp.service - LSB: Start NTP daemon
Loaded: loaded (/etc/init.d/ntp)
Active: active (running) since Wed 2018-05-23 15:58:50 UTC; 1 day 3h ago
but I've observed numerous cases of this reversion to 'LOCAL (0)' in the last few hours, so it's not like it's starting from scratch and needs time to initialize or collect the right data.