Recently there are always several SLES12.5 VMs in my domain got the NTP sync issue. So I made some research on it. Here's the details--
- I found 1 VM often raise NTP issues. So I started a monitoring job on it by running "ntpq -pn" each second. Yesterday, I found it again lost sync with the NTP server --
all ntp servers no response from 2022-07-22T05:16:34, And it's also confirmed by tcpdump -- from that monent -- no packet from ntp server sent back to this VM...
So I checked with the coomand ntpq --
vsa10027077:/tmp/eisen # ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
*127.127.1.0 .LOCL. 10 l 18 64 377 0.000 +0.000 0.000
147.204.9.202 162.159.200.1 4 u 5h 1024 0 2.168 -0.374 0.000
147.204.9.203 162.159.200.123 4 u 5h 1024 0 2.411 +1.608 0.000
147.204.9.204 162.159.200.1 4 u 5h 1024 0 1.917 -0.418 0.000
vsa10027077:/tmp/eisen # ntpq
ntpq> as
ind assid status conf reach auth condition last_event cnt
===========================================================
1 26549 961a yes yes none sys.peer sys_peer 1
2 26550 8013 yes no none reject unreachable 1
3 26551 8013 yes no none reject unreachable 1
4 26552 8013 yes no none reject unreachable 1
ntpq> rv 26550
associd=26550 status=8013 conf, sel_reject, 1 event, unreachable,
srcadr=147.204.9.202, srcport=123, dstadr=100.78.59.192, dstport=123,
leap=00, stratum=4, precision=-23, rootdelay=22.659, rootdisp=38.574,
refid=162.159.200.1,
reftime=e684ba76.20e3a34f Fri, Jul 22 2022 5:56:06.128,
rec=e684bf98.7a92b5e4 Fri, Jul 22 2022 6:18:00.478, reach=000,
unreach=28, hmode=3, pmode=4, hpoll=10, ppoll=10, headway=44,
flash=1400 peer_dist, peer_unreach, keyid=0, offset=-0.374, delay=2.168,
dispersion=15937.500, jitter=0.000, xleave=0.071,
filtdelay= 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00,
filtoffset= +0.00 +0.00 +0.00 +0.00 +0.00 +0.00 +0.00 +0.00,
filtdisp= 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0
All the flash are 1400 -- means the ntp servers -- 1000 -- unreachable or nonselect 400 -- distance threshold exceeded
- Since the ntpq said the ntp server take my VM's distance is too long, So I checked with ping and traceroute --
ping shows the ttl is only 252 and delay time is only 1.35ms without packets loss, And the traceroute shows there's only 4 hops from client to ntp server --
vsa10027077:/tmp/eisen # traceroute 147.204.9.202
traceroute to 147.204.9.202 (147.204.9.202), 30 hops max, 60 byte packets
1 host-100-78-56-1.fra1.od.sap.biz (100.78.56.1) 0.332 ms 0.316 ms 0.309 ms
2 130.214.162.65 (130.214.162.65) 0.829 ms 1.317 ms 1.047 ms
3 10.46.210.132 (10.46.210.132) 1.014 ms 1.278 ms 10.46.210.131 (10.46.210.131) 1.166 ms
4 10.46.210.129 (10.46.210.129) 3.102 ms * *
So I tried to manually reset the time by "ntpdate " after stop the ntpd service -- the offset looks very tiny -- then restart the ntpd service -- but sadly found the ntp server is still rejecting this VM--
vsa10027077:/tmp/eisen # systemctl stop ntpd
vsa10027077:/tmp/eisen # ntpdate 147.204.9.202
22 Jul 11:33:37 ntpdate[30877]: adjust time server 147.204.9.202 offset +0.000069 sec
vsa10027077:/tmp/eisen # systemctl start ntpd
Then I added "minpool 3 maxpoll 6" to each ntp server line in /etc/ntp.conf and restart the ntpd service, but still no work.
I'm confused -- the ntp server said my VM's distance is too long so reject it but both ping and traceroute shows the hops between them are small number. What makes this issue? How the ntp servers decide the distance from a client? And how to fix it? Please kind share your comments. Thanks in advance for your help.
Updated --
The ntpd's config file is --
vsa10027077:~ # cat /etc/ntp.conf
driftfile /var/lib/ntp/drift/ntp.drift
logfile /var/log/ntp
server 127.127.1.0
fudge 127.127.1.0 stratum 10
server timehost1.global.cloud.sap
server timehost2.global.cloud.sap
server timehost3.global.cloud.sap
# key configuration
keys /etc/ntp.keys
trustedkey 1
requestkey 1
controlkey 1
# by default act only as a basic NTP client
restrict default kod nomodify noquery notrap nopeer
restrict -6 default kod nomodify noquery notrap nopeer
restrict 127.0.0.1
restrict ::1
# allow NTP messages from the loopback address, useful for debugging
restrict localhost
### end of file
Yet, since in the recent 2 days -- the ntp service didn't get that server no response issue -- So I can't collect the output of "ntpq -c rv 0" of issue time, here's the output of normal time--
vsa10027077:~ # ntpq -c rv 0
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.8p15@1.3728-o Mon Jun 21 18:17:38 UTC 2021 (1)",
processor="x86_64", system="Linux/4.12.14-122.124-default", leap=00,
stratum=5, precision=-24, rootdelay=26.314, rootdisp=51.471,
refid=147.204.9.204,
reftime=e689d98d.602a4dc4 Tue, Jul 26 2022 3:10:05.375,
clock=e689d9d4.bea84735 Tue, Jul 26 2022 3:11:16.744, peer=2989, tc=5,
mintc=3, offset=+0.212857, frequency=+2.033, sys_jitter=0.876471,
clk_jitter=0.843, clk_wander=0.063
Please have a look. thanks
Updated 2022-08-09 -- Added "minpolls 3 maxpolls 6" to all ntp server line in /etc/ntp.conf and restart the ntpd. Still rejecting issue happened but the duration is much shorter than before -- it used to 30+hours now it's only 3 hours, the host will be back to normal. But -- still confused -- I've set the "max polls" to 6 which means the max polls should be 64 seconds. But when I check the ntpq -- it's already 256...
vsa9973928:/tmp/eisen # cat /etc/ntp.conf
driftfile /var/lib/ntp/drift/ntp.drift
logfile /var/log/ntp
server 127.127.1.0
fudge 127.127.1.0 stratum 10
server timehost1.global.cloud.sap minpoll 3 maxpoll 6
server timehost2.global.cloud.sap minpoll 3 maxpoll 6
server timehost3.global.cloud.sap minpoll 3 maxpoll 6
# key configuration
keys /etc/ntp.keys
trustedkey 1
requestkey 1
controlkey 1
# by default act only as a basic NTP client
restrict default kod nomodify noquery notrap nopeer
restrict -6 default kod nomodify noquery notrap nopeer
restrict 127.0.0.1
restrict ::1
# allow NTP messages from the loopback address, useful for debugging
restrict localhost
### end of file
vsa9973928:/tmp/eisen # ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 .LOCL. 10 l 220m 64 0 0.000 +0.000 0.000
+147.204.9.202 10.46.141.8 5 u 40 512 377 1.742 +0.128 1.060
+147.204.9.203 162.159.200.123 4 u 274 512 377 1.730 +1.539 2.245
*147.204.9.204 162.159.200.1 4 u 148 512 377 1.803 +0.585 0.900
What's problem made the polls interval exceed the limit in ntp.conf? Does anyone see this before?