0

Recently there are always several SLES12.5 VMs in my domain got the NTP sync issue. So I made some research on it. Here's the details--

  1. I found 1 VM often raise NTP issues. So I started a monitoring job on it by running "ntpq -pn" each second. Yesterday, I found it again lost sync with the NTP server --

all ntp servers no response from 2022-07-22T05:16:34, And it's also confirmed by tcpdump -- from that monent -- no packet from ntp server sent back to this VM...

So I checked with the coomand ntpq --

vsa10027077:/tmp/eisen # ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*127.127.1.0     .LOCL.          10 l   18   64  377    0.000   +0.000   0.000
 147.204.9.202   162.159.200.1    4 u   5h 1024    0    2.168   -0.374   0.000
 147.204.9.203   162.159.200.123  4 u   5h 1024    0    2.411   +1.608   0.000
 147.204.9.204   162.159.200.1    4 u   5h 1024    0    1.917   -0.418   0.000

vsa10027077:/tmp/eisen # ntpq
ntpq> as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 26549  961a   yes   yes  none  sys.peer    sys_peer  1
  2 26550  8013   yes    no  none    reject unreachable  1
  3 26551  8013   yes    no  none    reject unreachable  1
  4 26552  8013   yes    no  none    reject unreachable  1
ntpq> rv 26550
associd=26550 status=8013 conf, sel_reject, 1 event, unreachable,
srcadr=147.204.9.202, srcport=123, dstadr=100.78.59.192, dstport=123,
leap=00, stratum=4, precision=-23, rootdelay=22.659, rootdisp=38.574,
refid=162.159.200.1,
reftime=e684ba76.20e3a34f  Fri, Jul 22 2022  5:56:06.128,
rec=e684bf98.7a92b5e4  Fri, Jul 22 2022  6:18:00.478, reach=000,
unreach=28, hmode=3, pmode=4, hpoll=10, ppoll=10, headway=44,
flash=1400 peer_dist, peer_unreach, keyid=0, offset=-0.374, delay=2.168,
dispersion=15937.500, jitter=0.000, xleave=0.071,
filtdelay=     0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00,
filtoffset=   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00,
filtdisp=   16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0

All the flash are 1400 -- means the ntp servers -- 1000 -- unreachable or nonselect 400 -- distance threshold exceeded

  1. Since the ntpq said the ntp server take my VM's distance is too long, So I checked with ping and traceroute --

ping shows the ttl is only 252 and delay time is only 1.35ms without packets loss, And the traceroute shows there's only 4 hops from client to ntp server --

vsa10027077:/tmp/eisen # traceroute 147.204.9.202
traceroute to 147.204.9.202 (147.204.9.202), 30 hops max, 60 byte packets
 1  host-100-78-56-1.fra1.od.sap.biz (100.78.56.1)  0.332 ms  0.316 ms  0.309 ms
 2  130.214.162.65 (130.214.162.65)  0.829 ms  1.317 ms  1.047 ms
 3  10.46.210.132 (10.46.210.132)  1.014 ms  1.278 ms 10.46.210.131 (10.46.210.131)  1.166 ms
 4  10.46.210.129 (10.46.210.129)  3.102 ms * *
  1. So I tried to manually reset the time by "ntpdate " after stop the ntpd service -- the offset looks very tiny -- then restart the ntpd service -- but sadly found the ntp server is still rejecting this VM--

    vsa10027077:/tmp/eisen # systemctl stop ntpd

    vsa10027077:/tmp/eisen # ntpdate 147.204.9.202

    22 Jul 11:33:37 ntpdate[30877]: adjust time server 147.204.9.202 offset +0.000069 sec

    vsa10027077:/tmp/eisen # systemctl start ntpd

  2. Then I added "minpool 3 maxpoll 6" to each ntp server line in /etc/ntp.conf and restart the ntpd service, but still no work.

    I'm confused -- the ntp server said my VM's distance is too long so reject it but both ping and traceroute shows the hops between them are small number. What makes this issue? How the ntp servers decide the distance from a client? And how to fix it? Please kind share your comments. Thanks in advance for your help.

Updated --

The ntpd's config file is --

vsa10027077:~ # cat /etc/ntp.conf

driftfile /var/lib/ntp/drift/ntp.drift
logfile   /var/log/ntp

server 127.127.1.0
fudge  127.127.1.0 stratum 10

server timehost1.global.cloud.sap 
server timehost2.global.cloud.sap 
server timehost3.global.cloud.sap 

# key configuration
keys /etc/ntp.keys
trustedkey 1
requestkey 1
controlkey 1

# by default act only as a basic NTP client
restrict default kod nomodify noquery notrap nopeer
restrict -6 default kod nomodify noquery notrap nopeer
restrict 127.0.0.1
restrict ::1
# allow NTP messages from the loopback address, useful for debugging
restrict localhost

### end of file

Yet, since in the recent 2 days -- the ntp service didn't get that server no response issue -- So I can't collect the output of "ntpq -c rv 0" of issue time, here's the output of normal time--

vsa10027077:~ # ntpq -c rv 0
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.8p15@1.3728-o Mon Jun 21 18:17:38 UTC 2021 (1)",
processor="x86_64", system="Linux/4.12.14-122.124-default", leap=00,
stratum=5, precision=-24, rootdelay=26.314, rootdisp=51.471,
refid=147.204.9.204,
reftime=e689d98d.602a4dc4  Tue, Jul 26 2022  3:10:05.375,
clock=e689d9d4.bea84735  Tue, Jul 26 2022  3:11:16.744, peer=2989, tc=5,
mintc=3, offset=+0.212857, frequency=+2.033, sys_jitter=0.876471,
clk_jitter=0.843, clk_wander=0.063

Please have a look. thanks

Updated 2022-08-09 -- Added "minpolls 3 maxpolls 6" to all ntp server line in /etc/ntp.conf and restart the ntpd. Still rejecting issue happened but the duration is much shorter than before -- it used to 30+hours now it's only 3 hours, the host will be back to normal. But -- still confused -- I've set the "max polls" to 6 which means the max polls should be 64 seconds. But when I check the ntpq -- it's already 256...

vsa9973928:/tmp/eisen # cat /etc/ntp.conf

driftfile /var/lib/ntp/drift/ntp.drift
logfile   /var/log/ntp

server 127.127.1.0
fudge  127.127.1.0 stratum 10

server timehost1.global.cloud.sap minpoll 3 maxpoll 6
server timehost2.global.cloud.sap minpoll 3 maxpoll 6
server timehost3.global.cloud.sap minpoll 3 maxpoll 6

# key configuration
keys /etc/ntp.keys
trustedkey 1
requestkey 1
controlkey 1

# by default act only as a basic NTP client
restrict default kod nomodify noquery notrap nopeer
restrict -6 default kod nomodify noquery notrap nopeer
restrict 127.0.0.1
restrict ::1
# allow NTP messages from the loopback address, useful for debugging
restrict localhost

### end of file
vsa9973928:/tmp/eisen # ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     .LOCL.          10 l 220m   64    0    0.000   +0.000   0.000
+147.204.9.202   10.46.141.8      5 u   40  512  377    1.742   +0.128   1.060
+147.204.9.203   162.159.200.123  4 u  274  512  377    1.730   +1.539   2.245
*147.204.9.204   162.159.200.1    4 u  148  512  377    1.803   +0.585   0.900

What's problem made the polls interval exceed the limit in ntp.conf? Does anyone see this before?

EisenWang
  • 1
  • 1
  • You're likely to get more viewers if you make your post readable. – Daniel Jul 24 '22 at 14:01
  • Yes. But it looks both ctrl-K and ctrl-Q all can't make the text to its original format as it works on stackoverflow... I modified it for 1 hour ... And this is the best effect I can have ... :'( – EisenWang Jul 24 '22 at 14:39
  • Please show your configuration (sans comments - you can use `grep '^[^#]' /etc/ntp.conf` to do this). You have an older clock configuration which should not be used, and possibly other problems which are preventing your configuration from working correctly. You should also show the output of `ntpq -c rv 0`. – Paul Gear Jul 25 '22 at 22:30
  • Hi, Paul. Thanks for your comments. I've updated the post. Pls have a look. Thanks again. – EisenWang Jul 26 '22 at 03:14
  • @gapsf Thanks. But the gap between ntp server and my VM is very small while ntp server rejecting my VM-- only +0.000069 sec... It should not be the reason... – EisenWang Jul 26 '22 at 04:09
  • Maby just try servers from pool.ntp.org? – gapsf Jul 26 '22 at 05:10
  • Thanks. But no... we have to use these authorized ntp server on our own land... – EisenWang Jul 27 '22 at 04:26

0 Answers0