I have a large web-based application running in AWS with numerous EC2 instances. Occasionally -- about twice or thrice per week -- I receive an alarm notification from my Sensu monitoring system notifying me that one of my instances has hit 100% CPU.
This is the notification:
CheckCPU TOTAL WARNING: total=100.0 user=0.0 nice=0.0 system=0.0 idle=25.0 iowait=100.0 irq=0.0 softirq=0.0 steal=0.0 guest=0.0
Host: my_host_name
Timestamp: 2016-09-28 13:38:57 +0000
Address: XX.XX.XX.XX
Check Name: check-cpu-usage
Command: /etc/sensu/plugins/check-cpu.rb -w 70 -c 90
Status: 1
Occurrences: 1
This seems to be a momentary occurrence and the CPU goes back down to normal levels within seconds. So it seems like something not to get too worried about. But I'm still curious why it is happening. Notice that the CPU is taken up with the 100% IOWaits.
FYI, Amazon's monitoring system doesn't notice this blip. See the images below showing the CPU & IOlevels at 13:38
Interestingly, AWS says tells me that this instance will be retired soon. Might that be the two be related?