0

TLDR: I am getting "Leap 11", which means the server is unsynchronized. How to I fix this?

Some background for the problem I am trying to solve. I have a payload that will be on a spacecraft. The spacecraft provides NTP, but only one server. I will have a sub-payload installed in our payload. This sub-payload is being developed by a different organization. The sub-payload requires access to NTP. We have a dedicated network interface to communicate with the sub-payload. We can not bridge or route traffic between the spacecraft network and the sub-payload network. Thus our payload needs to be a client of the spacecraft NTP, and be a server for our sub-payload. Our payload is running Windows 7 SP1.

I spent considerable time trying to get Microsoft's w32tm time service to synchronize with NTP and act as an NTP server. I have given up on this.

I am now trying to use Meinberg NTP. This seems to be much better, but I still have problems. The server keeps reporting "leap not in sync" and "leap_alarm" and "leap 11". I've been digging for days and haven't figured out what is wrong. How can I get this to work?

Examples below are a virtual machine running Windows 7 so I can easily experiment and get a reliable setup before installing this on our payload.

NTP on the Windows 7 system is synchronized to a remote server.

C:\Program Files (x86)\NTP\etc>ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     .LOCL.           6 l   65   64    2    0.000   +0.000   0.977
*128.138.XXX.XXX 128.138.YYY.YYY  2 u    6    8  377    0.977  +912.07 579.941

Info on the NTP installed on Windows 7:

C:\Program Files (x86)\NTP\etc>ntpq -crv
associd=0 status=c613 leap_alarm, sync_ntp, 1 event, spike_detect,
version="ntpd 4.2.8p15-o Jun 25 14:45:34 (UTC+02:00) 2020  (2)",
processor="x86", system="Windows", leap=11, stratum=3, precision=-10,
rootdelay=2.380, rootdisp=2952.787, refid=128.138.XXX.XXX,
reftime=e313d90f.d74bc8e2  Tue, Sep 22 2020  1:59:43.841,
clock=e313d911.12cbcb26  Tue, Sep 22 2020  1:59:45.073, peer=59255, tc=3,
mintc=3, offset=+0.000000, frequency=+0.000, sys_jitter=494.744833,
clk_jitter=0.977, clk_wander=0.000

When I use ntpdate to query the NTP server running on my Windows 7 system, it fails:

C:\Program Files (x86)\NTP\etc>ntpdate -d -v 127.0.0.1
22 Sep 02:00:32 ntpdate[1512]: ntpdate 4.2.8p15-o Jun 25 14:48:06 (UTC+02:00) 2020  (2)
22 Sep 02:00:32 ntpdate[1512]: Raised to realtime priority class
transmit(127.0.0.1)
receive(127.0.0.1)
transmit(127.0.0.1)
receive(127.0.0.1)
transmit(127.0.0.1)
receive(127.0.0.1)
transmit(127.0.0.1)
receive(127.0.0.1)
127.0.0.1: Server dropped: leap not in sync

server 127.0.0.1, port 123
stratum 3, precision -10, leap 11, trust 000
refid [128.138.XXX.XXX], root delay 0.002380, root dispersion 3.470093
reference time:      e313d93f.d74bc6bf  Tue, Sep 22 2020  2:00:31.841
originate timestamp: e313d946.7acbce26  Tue, Sep 22 2020  2:00:38.479
transmit timestamp:  e313d94622 Sep 02:00:38 ntpdate[1512]: no server suitable for synchronization found
.7acbc713  Tue, Sep 22 2020  2:00:38.479
filter delay:  0.02658    0.02658    0.02658    0.02658
               ----       ----       ----       ----
filter offset: +0.000000  +0.000000  +0.000000  +0.000000
               ----       ----       ----       ----
delay 0.02658, dispersion 0.00000, offset +0.000000

And config file with comments removed. The sub-payload must have a working NTP server - a slightly incorrect server is better than no server. So I have configured the local clock as a source.

C:\Program Files (x86)\NTP\etc>type ntp.conf
restrict default noquery nopeer nomodify notrap
restrict -6 default noquery nopeer nomodify notrap
restrict 127.0.0.1
restrict -6 ::1
restrict 172.22.11.0
driftfile "C:\Program Files (x86)\NTP\etc\ntp.drift"
server 127.127.1.0
fudge 127.127.1.0 stratum 6
server 128.138.XXX.XXX iburst minpoll 3 maxpoll 7 prefer

EDIT: adding info for the ntpq readvar command.

C:\Program Files (x86)\NTP\etc>ntpq
ntpq> readvar
associd=0 status=c613 leap_alarm, sync_ntp, 1 event, spike_detect,
version="ntpd 4.2.8p15-o Jun 25 14:45:34 (UTC+02:00) 2020  (2)",
processor="x86", system="Windows", leap=11, stratum=3, precision=-10,
rootdelay=2.289, rootdisp=4553.250, refid=128.138.XXX.XXX,
reftime=e3152168.e15ea3de  Wed, Sep 23 2020  1:20:40.880,
clock=e315216d.520c0641  Wed, Sep 23 2020  1:20:45.320, peer=59255, tc=3,
mintc=3, offset=+0.000000, frequency=+6.278, sys_jitter=798.325045,
clk_jitter=0.977, clk_wander=0.000

ntpq> assoc
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 59254  90f4   yes   yes  none    reject   reachable 15
  2 59255  961a   yes   yes  none  sys.peer    sys_peer  1

ntpq> readvar 59254
associd=59254 status=90f4 conf, reach, sel_reject, 15 events, reachable,
srcadr=LOCAL(0), srcport=123, dstadr=127.0.0.1, dstport=123, leap=00,
stratum=6, precision=-10, rootdelay=0.000, rootdisp=10.000, refid=LOCL,
reftime=(no time), rec=e31520b1.e3d46a48  Wed, Sep 23 2020  1:17:37.889,
reach=010, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=0,
flash=00 ok, keyid=0, ttl=0, offset=+0.000, delay=0.000,
dispersion=7937.988, jitter=0.977,
filtdelay=     0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00,
filtoffset=   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00,
filtdisp=      0.98 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0

ntpq> readvar 59255
associd=59255 status=961a conf, reach, sel_sys.peer, 1 event, sys_peer,
srcadr=128.138.XXX.XXX, srcport=123, dstadr=172.22.11.179,
dstport=123, leap=00, stratum=2, precision=-18, rootdelay=1.312,
rootdisp=24.643, refid=128.138.YYY.YYY,
reftime=e3151e59.4db14951  Wed, Sep 23 2020  1:07:37.303,
rec=e3152180.e100aeac  Wed, Sep 23 2020  1:21:04.878, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=3, ppoll=3, headway=1, flash=00 ok,
keyid=0, offset=+4208.463, delay=0.977, dispersion=1.094, jitter=714.431,
xleave=0.001,
filtdelay=     0.98    0.98    0.98    0.98   29.29    0.98  118.15  197.25,
filtoffset= +4208.4 +4036.9 +3901.7 +3729.3 +3516.3 +3337.0 +3256.6 +3171.4,
filtdisp=      0.98    1.10    1.22    1.34    1.46    1.58    1.70    1.82

EDIT2: Removing the local clock as a source. Changed config and rebooted.

Config file with comments removed.

C:\Program Files (x86)\NTP\etc>type ntp.conf
restrict default noquery nopeer nomodify notrap
restrict -6 default noquery nopeer nomodify notrap
restrict 127.0.0.1
restrict -6 ::1
restrict 172.22.11.0
driftfile "C:\Program Files (x86)\NTP\etc\ntp.drift"
server time.colorado.edu iburst minpoll 3 maxpoll 7 prefer
C:\Program Files (x86)\NTP\etc>ntpq
ntpq> readvar
associd=0 status=c613 leap_alarm, sync_ntp, 1 event, spike_detect,
version="ntpd 4.2.8p15-o Jun 25 14:45:34 (UTC+02:00) 2020  (2)",
processor="x86", system="Windows", leap=11, stratum=3, precision=-10,
rootdelay=2.319, rootdisp=899.108, refid=128.138.XXX.XXX,
reftime=e3153daa.b2084280  Wed, Sep 23 2020  3:21:14.695,
clock=e3153db1.9672107e  Wed, Sep 23 2020  3:21:21.587, peer=35575, tc=3,
mintc=3, offset=+0.000000, frequency=+16.909, sys_jitter=337.424222,
clk_jitter=0.977, clk_wander=0.000

ntpq> peer
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*128.138.XXX.XXX 128.138.YYY.YYY  2 u    5    8   377   0.977  +5100.9 810.260

ntpq> assoc
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 35575  961a   yes   yes  none  sys.peer    sys_peer  1

ntpq> readvar 35575
associd=35575 status=961a conf, reach, sel_sys.peer, 1 event, sys_peer,
srcadr=128.138.XXX.XXX, srcport=123, dstadr=172.22.11.179,
dstport=123, leap=00, stratum=2, precision=-18, rootdelay=1.343,
rootdisp=38.300, refid=128.138.YYY.YYY,
reftime=e3153bd3.adb315a2  Wed, Sep 23 2020  3:13:23.678,
rec=e3153e8e.af79f93e  Wed, Sep 23 2020  3:25:02.685, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=3, ppoll=3, headway=0, flash=00 ok,
keyid=0, offset=+4486.303, delay=0.977, dispersion=1.092, jitter=606.007,
xleave=0.000,
filtdelay=     0.98    0.98    0.98    0.98    0.98   97.64    0.98    0.98,
filtoffset= +4486.3 +4378.9 +4227.2 +4084.3 +3943.5 +3853.0 +3658.4 +3511.8,
filtdisp=      0.98    1.10    1.22    1.34    1.46    1.58    1.70    1.82

EDIT3: I built a fresh Windows 7 Enterprise SP1 system. Installed a simple configuration of Meinberg NTP using two time servers here on campus. Win7 refuses to synchronize with those servers. I wrote a simple script to try and force it to synchronize, but no success.

@echo off

echo NTP PEERS >> ntp-query.log 2>&1
ntpq -p >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo NTP ASSOCIATIONS >> ntp-query.log 2>&1
ntpq -c as >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo NTP READVARS >> ntp-query.log 2>&1
ntpq -c rv >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo STOP NTP SERVICE >> ntp-query.log 2>&1
net stop ntp >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo SYNC CLOCK >> ntp-query.log 2>&1
ntpdate -b -v time.colorado.edu >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo START NTP SERVICE >> ntp-query.log 2>&1
net start ntp >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo NTP PEERS >> ntp-query.log 2>&1
ntpq -p >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo NTP ASSOCIATIONS >> ntp-query.log 2>&1
ntpq -c as >> ntp-query.log 2>&1

echo. >> ntp-query.log 2>&1
echo. >> ntp-query.log 2>&1
echo NTP READVARS >> ntp-query.log 2>&1
ntpq -c rv >> ntp-query.log 2>&1
NTP PEERS
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 tcom-gw-loop.co 128.138.140.44   2 u   65   64    3    3.897  +2854.1 1695.66
 utcnist2.colora .NIST.           1 u   59   64    3   19.527  +2981.0 1708.88


NTP ASSOCIATIONS
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 48693  901a   yes   yes  none    reject    sys_peer  1
  2 48694  901a   yes   yes  none    reject    sys_peer  1


NTP READVARS
associd=0 status=c018 leap_alarm, sync_unspec, 1 event, no_sys_peer,
version="ntpd 4.2.8p15-o Jun 25 14:45:34 (UTC+02:00) 2020  (2)",
processor="x86", system="Windows", leap=11, stratum=2, precision=-10,
rootdelay=19.771, rootdisp=1645.581, refid=128.138.141.172,
reftime=e323864f.c85581a0  Sat, Oct  3 2020 23:22:55.782,
clock=e32386e0.89558364  Sat, Oct  3 2020 23:25:20.536, peer=0, tc=6,
mintc=3, offset=+0.000000, frequency=+0.000, sys_jitter=0.000000,
clk_jitter=0.977, clk_wander=0.000


STOP NTP SERVICE
The Network Time Protocol Daemon service is stopping.
The Network Time Protocol Daemon service was stopped successfully.



SYNC CLOCK
 3 Oct 23:25:23 ntpdate[1688]: ntpdate 4.2.8p15-o Jun 25 14:48:06 (UTC+02:00) 2020  (2)
 3 Oct 23:25:23 ntpdate[1688]: Raised to realtime priority class
 3 Oct 23:25:33 ntpdate[1688]: step time server 128.138.82.156 offset +4.142855 sec


START NTP SERVICE
The Network Time Protocol Daemon service is starting.
The Network Time Protocol Daemon service was started successfully.



NTP PEERS
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 tcom-gw-loop.co 128.138.140.44   2 u    1   64    1   35.147  +68.050   0.977
 utcnist2.colora .NIST.           1 u    1   64    1   16.598  +56.055   0.977


NTP ASSOCIATIONS
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 49563  9014   yes   yes  none    reject   reachable  1
  2 49564  9014   yes   yes  none    reject   reachable  1


NTP READVARS
associd=0 status=c016 leap_alarm, sync_unspec, 1 event, restart,
version="ntpd 4.2.8p15-o Jun 25 14:45:34 (UTC+02:00) 2020  (2)",
processor="x86", system="Windows", leap=11, stratum=16, precision=-10,
rootdelay=0.000, rootdisp=0.045, refid=INIT, reftime=(no time),
clock=e32386f0.0b989634  Sat, Oct  3 2020 23:25:36.045, peer=0, tc=3,
mintc=3, offset=+0.000000, frequency=+0.000, sys_jitter=0.000000,
clk_jitter=0.977, clk_wander=0.000
  • What is the actual environment for "payload" and "subpayload"? If they are running on same hardware, they should share the same clock, so normal clock operations should be fine for the subpayload? Or is there some specific need for the subpayload to use NTP? – Tero Kilkanen Sep 22 '20 at 06:25
  • Two different computers with two very different functions. They are physically distinct systems. The virtual machine in my example here is only for testing how to setup NTP, and does not reflect the real hardware. – Jim Wright Sep 22 '20 at 16:17
  • It might be that the NTP client misbehaves because it is running on the same machine than the NTP server. NTP client tries to adjust the clock that NTP server depends on. Try using an NTP client from a separate computer. – Tero Kilkanen Sep 22 '20 at 19:38
  • In this test, the simulated payload NTP client is on a virtual machine hosted on a server in my lab. This client is using an NTP server on campus, separate from the server and the virtual machine. – Jim Wright Sep 22 '20 at 20:08
  • Let's try to have a more realistic test environment. How is your NTP server going to synchronize while in space? GPS? Not at all? – Michael Hampton Sep 22 '20 at 21:12
  • Is your question example incorrect then? There you are using `ntpdate` client with NTP server running on localhost `127.0.0.1`. This implies that in your test, you were using NTP client to connect to NTP server running on the same system. – Tero Kilkanen Sep 22 '20 at 21:34
  • I used ntpdate with localhost to query what is happening on the Windows 7 machine. I use "-d" for debug so it makes no changes. I can do the same query from a remote machine and get the same result. – Jim Wright Sep 22 '20 at 23:34
  • Sorry, it doesn't get "more realistic". NASA provides the NTP server, I am building the payload. – Jim Wright Sep 22 '20 at 23:35

1 Answers1

2

What I think is happening here is that your system and your upstream NTP disagree on whether there's a leap second this month. However, this is a guess, and it may not apply to your situation.

Here's an example from my system to show how to work it out. My local server says there's no leap second (leap_none, and leap=00):

ntpq> rv
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.8p12@1.3728-o (1)", processor="x86_64",
system="Linux/4.19.0-9-amd64", leap=00, stratum=2, precision=-18,
rootdelay=0.564, rootdisp=6.541, refid=172.22.254.53,
reftime=e3150f56.6c8ad2f3  Wed, Sep 23 2020 10:03:34.423,
clock=e3151066.bbdbd233  Wed, Sep 23 2020 10:08:06.733, peer=8919, tc=6,
mintc=3, offset=0.261720, frequency=10.858, sys_jitter=0.191792,
clk_jitter=0.486, clk_wander=0.013

Find the value from my sync peer:

# ntpq -n
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 ntp.ubuntu.com  .POOL.          16 p    -   64    0    0.000    0.000   0.004
 time.apple.com  .POOL.          16 p    -   64    0    0.000    0.000   0.004
 time.windows.co .POOL.          16 p    -   64    0    0.000    0.000   0.004
-2001:44b8:2100: 239.29.146.228   2 s  190 1024  377    0.438   -0.020   0.099
*172.22.254.53   .PPS.            1 s  346 1024  377    0.770   -0.063  10.898
+172.22.160.64   172.22.254.53    2 s  361 1024  377    0.389   -0.154   0.237
+172.22.160.61   17.253.66.253    2 s  117 1024  377    0.661   -0.052   0.108
 47.51.249.154   .PPS.            1 u  620 1024    1  232.253    1.231   0.385
 2a03:2880:ff0c: .FB...           1 u  225 1024  377   17.630    0.167   4.988
 2606:4700:f1::1 10.26.8.188      3 u  399 1024  377   19.256   -0.003   2.240
-2001:44b8:1::1  203.35.83.242    2 u  375 1024  377    6.329   -1.184   6.398

The peer is the 5th association. Get the association id. We could have skipped the peers display above and just looked for the one in the association table with the condition sys.peer.

ntpq> assoc
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 37567  8811   yes  none  none    reject    mobilize  1
  2 37568  8811   yes  none  none    reject    mobilize  1
  3 37569  8811   yes  none  none    reject    mobilize  1
  4 37570  934a   yes   yes  none   outlier    sys_peer  4
  5 37571  96fa   yes   yes  none  sys.peer    sys_peer 15
  6 37572  94fa   yes   yes  none candidate    sys_peer 15
  7 37573  94fa   yes   yes  none candidate    sys_peer 15
  8 37574  9014   yes   yes  none    reject   reachable  1
  9 37575  9024   yes   yes  none    reject   reachable  2
 10 37576  9024   yes   yes  none    reject   reachable  2
 11 37577  931a   yes   yes  none   outlier    sys_peer  1

Now get the values for association 37571 (your numbers will be different):

ntpq> rv 37571
associd=37571 status=96fa conf, reach, sel_sys.peer, 15 events, sys_peer,
srcadr=172.22.254.53, srcport=123, dstadr=172.22.254.2, dstport=123,
leap=00, stratum=1, precision=-19, rootdelay=0.000, rootdisp=1.099,
refid=PPS, reftime=e3150c9d.0367d977  Wed, Sep 23 2020  9:51:57.013,
rec=e3150ca5.0ac60428  Wed, Sep 23 2020  9:52:05.042, reach=377,
unreach=0, hmode=1, pmode=2, hpoll=10, ppoll=10, headway=0, flash=00 ok,
keyid=0, offset=-0.063, delay=0.770, dispersion=15.496, jitter=10.898,
xleave=0.096,
filtdelay=     0.77    0.82    1.51    1.13    0.63   91.45    1.09    0.82,
filtoffset=   -0.06   -0.10   -0.44   -0.01   -0.20   28.77   -0.28   -0.02,
filtdisp=      0.01   15.79   31.79   47.50   62.87   78.35   94.22  110.32

So it says leap=00 also, and we're all good.

Your configuration has a local clock and a remote peer. Find the leap indicator values for both - I'll bet they're different. When you've only got 2 peers, you can't get a majority on the value of the leap indicator, so it can't make a good decision. Try taking out the local clock or adding another remote peer whose leap indicator agrees with your current sync peer.

P.S. Please tell me you're not going to put Windows 7 on a spaceship. :-)

Paul Gear
  • 4,367
  • 19
  • 38
  • 1
    I agree with your comment on 1 or 2 peers being not desirable. But the spacecraft only provides one NTP server. And if that NTP server has issues for some reason, my local payload will refuse to serve NTP info to the sub-payload. Hence the need to fall-back to the local clock. Thus I'm stuck with 2 peers. I used "readvar" on the local payload NTP server, and on the two associations - the local clock and the remote NTP. The local clock and remote NTP both have leap=00, while the local NTP has leap=11. I fought for years to not have Windows. Failed. We have suffered ever since. – Jim Wright Sep 23 '20 at 01:26
  • Does the Meinberg NTP implementation on Windows supply a leap second file? It might be that the file is missing, hence the report of leap=11. https://kb.meinbergglobal.com/kb/time_sync/ntp/configuration/ntp_leap_second_file – Paul Gear Sep 23 '20 at 11:46
  • 1
    I am looking in to leap seconds, but I believe the problem is not with leap seconds. They allocated two bits for leap second indicator, but only have three states. So they overloaded it to have the fourth state mean "clock is free running". I think the problem is that the payload NTP server keeps saying it is not synchronized, but I can't figure out why. Looking at the source, it appears one reason for setting LEAP_NOTINSYNC is because the clock was stepped. So if the clock is constantly stepped, maybe this explains it. But then why is it being stepped, and how to fix? – Jim Wright Sep 23 '20 at 17:43
  • Your first peers output shows a relatively high offset and jitter. It's not unlikely that steps are happening. But it also says it's synced to the peer, and so the leap=11 shouldn't happen at the same time. It also hasn't managed to baseline your frequency error, which is still at zero. So maybe the local clock on your device is struggling for accuracy. I would suggest not setting maxpoll so low (probably leave it out altogether), enabling all the statistics logging you can, and letting it run for several hours to see what numbers pops out. – Paul Gear Sep 23 '20 at 21:04
  • 1
    Yes, I can't make sense of the contradictory indications. I investigated some to see if using NTP on a virtual machine is a problem, but I see Microsoft, VMWare and RedHat all recommending it. And I have numerous Linux virtual machines running NTP as both client and client/server for years with no problem. – Jim Wright Sep 24 '20 at 22:23
  • Any joy with the logging @JimWright? Feel free to drop me a message or visit #ntp on Freenode IRC to chat. – Paul Gear Sep 25 '20 at 20:04
  • 1
    I've finally had time to continue working on this. I built a fresh new Win7 machine with all updates (what a pain) in case my previous test machine was messed up. I then installed NTP. I set it up with two campus servers plus local clock. It always used local clock. I removed local clock, and it is not able to maintain sync. Even after stop/ntpdate/start cycle, it loses sync within minutes and after an hour the clock has drifted 5 seconds. A Linux VM on same server works fine as both NTP client and server. Still can't figure this out. – Jim Wright Oct 02 '20 at 23:35