I'm trying to determine why this Alert (NodeClockNotSynchronising
) is firing for a handful of VMs I've provisioned. (Not all just a few, which is strange)
According to the metrics that are exported, I'm seeing:
# HELP node_timex_sync_status Is clock synchronized to a reliable server (1 = yes, 0 = no).
# TYPE node_timex_sync_status gauge
node_timex_sync_status 0
I can ssh into one of the VMs and ntpd
is running and the date
command returns the correct time.
So digging into the timex
collector documentation and code here's what is "failing":
var syncStatus float64
var divisor float64
var timex = new(unix.Timex)
status, err := unix.Adjtimex(timex)
if err != nil {
return fmt.Errorf("failed to retrieve adjtimex stats: %w", err)
}
if status == timeError {
syncStatus = 0
} else {
syncStatus = 1
}
Since syncStatus is 0 the alert is being fired. Doing some digging into the return codes of adjtimex() syscall:
#define TIME_ERROR 5 /* clock not synchronized */
Why would the kernel return TIME_ERROR
when ntpd
is running and the clock is synchronized? Any help would be greatly appreciate.