Usually, gaps in graphs are a result from the munin server not being able to contact the node. This can be because the server is too busy (too many nodes to process), the node is too busy or unreachable, or the network is down.
In this case, there are regular 15 minute intervals at suspiciously accurate intervals. This usually points to a scheduled job of some sort, which causes munin to fail at these times:
- 6:55 to 7:10
- 7:55 to 8:15
- 8:55 to 9:10
- 9:55 to 10:10
- 10:55 to 11:10
- 11:55 to 12:10
I'd start my investigation with monitoring CPU, Disk and Network activity manually on both the server and the node simultaneously between 10:55 and 11:10 (using top
and ping
will give you a ballpark idea)