0

Recently I've encountered 2 instances which are in a VPC that have time drifted issue. What I've noticed is that the time servers does not have * and + prefix compared to another instance with accurate time in the same autoscaling group.

“+” – Good and a preferred remote peer or server (included by the combine algorithm)

“*” – The remote peer or server presently used as the primary reference

EC2 instance where time has drifted

$ ntpq -p
    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
y.ns.gin.ntt.ne .STEP.          16 u    - 1024    0    0.000    0.000   0.000
ns1.unico.com.a .STEP.          16 u    - 1024    0    0.000    0.000   0.000
saul.foodworks. .STEP.          16 u    - 1024    0    0.000    0.000   0.000
b.pool.ntp.uq.e .STEP.          16 u    - 1024    0    0.000    0.000   0.000
internalntpserver1. 10.68.10.1       8 u  815 1024  377    0.862  -477696 2391.53
internalntpserver2. 10.68.2.226      7 u  213 1024  377    1.755  -477012 1861.00

EC2 instance where time is correct

# ntpq -p
    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
0.time.itoc.com .STEP.          16 u    - 1024    0    0.000    0.000   0.000
a.pool.ntp.uq.e .STEP.          16 u    - 1024    0    0.000    0.000   0.000
node01.au.verbn .STEP.          16 u    - 1024    0    0.000    0.000   0.000
node02.au.verbn .STEP.          16 u    - 1024    0    0.000    0.000   0.000
+internalntpserver1. 10.68.10.1       8 u  680 1024  377    1.551  -260.56  77.778
*internalntpserver2. 10.68.2.226      7 u  719 1024  377    0.631  -114.34 334.611

Restarting ntpd daemon fixed that but I can't find anything online as to what could have caused this behaviour.

Any help would be very much appreciated.

Thank you.

Imagineer
  • 815
  • 2
  • 10
  • 20

1 Answers1

3

In the first example, the jitter is very high, as is the offset. With jitter measured in seconds, NTP will probably just decide that both reference servers are insane and will refuse to sync.

Your other problem is that the rule for NTP reference servers is "one or four". A man with two clocks is never sure which clock is wrong, a man with three clocks can exclude one of them that doesn't agree with the other two. But you should have four, just in case one of them is not reachable.

The reachability of the other reference servers is also a big problem, you need to figure out what firewall is blocking access to NTP packets going to those servers.

tgharold
  • 609
  • 8
  • 19
  • 1
    +1 ...and the two servers themselves should also be checked for sanity... it smells like one of them might be deriving its time from the other, since they're showing to be in different strata (7 vs. 8) (yikes). – Michael - sqlbot May 05 '16 at 18:35
  • Found a possible solution which we use inhouse for our VMware VMs. The configuration directive tinker panic 0 instructs NTP to not give up if it sees a large jump in time. This is important for coping with large time drifts and also resuming virtual machines from their suspended state. Note: The tinker panic 0 directive must be at the top of the ntp.conf file. http://serverfault.com/questions/540791/what-are-the-disadvantages-of-disabling-tinker-panic-0-in-ntp – Imagineer May 09 '16 at 02:32