0

I run < 24 checks on my systems. Servers are not regularly heavily loaded. Load averages keep well under 1 during normal operation.

I have noticed a re-occurring issue where the check-cpu check would start triggering high load averages on systems where there was no organic cause for high load. Further investigation showed the high load report was actually due to the check-cpu script running in parallel with other checks. Outside of the checks executing, cpu load was fine.

I upgraded from sensu 0.20 to 0.23 and continued to observe the same issue.

We found that a re-start of the sensu-server and sensu-client services would resolve the problem for a period of time (approximately 24 hours) and then it would return.

We theorized at this point, there must be some sort of time-delay in the dispatch / execution of the checks on the host which causes this overlap to eventually occur.

All checks are set to run at an interval of 30 or 60.

I decided to set the interval of the check-cpu check to 83, and the issue has not occurred since. Presumably because the check-cpu check does not coincide with any others, thus not seeing high cpu load during that short moment.

Is this some sort of inherent scheduling issue with sensu? Is it supposed to know how to dispatch checks with adequate spacing, or is this something that should be controlled by the interval parameter?

Thanks!

Kobbe
  • 809
  • 5
  • 16
dank
  • 579
  • 3
  • 11

1 Answers1

2

I have noticed that the checks drift in execution time. i.e they do not run exactly every 30 seconds but every 30.001s or something like that. I guess the drift might be different on different checks. So eventually you will run into the problem that the checks sync up and all run at the same time, causing the problem. Running more checks at regular intervals (30s, 60s etc) will make this problem occur more often. If you want a change to this problem you have to report it to sensu directly. I think they might fix it eventually since they probably want the system to be scalable.

Kobbe
  • 809
  • 5
  • 16
  • 1
    Thanks for the input! This is very helpful. I have opened this issue with Sensu: https://github.com/sensu/sensu/issues/1260 – dank May 10 '16 at 16:09