1

We are seeing what appear to be invalid values in top on more than one of our AWS instances:

top - 08:44:56 up 259 days, 17:31,  1 user,  load average: 0.01, 0.02, 0.01
Tasks:  78 total,   1 running,  77 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,29800.0%id,  0.0%wa,  0.0%hi,  0.0%si,-237844987904.0%st
Mem:   2049560k total,  1682416k used,   367144k free,   183296k buffers
Swap:        0k total,        0k used,        0k free,   838992k cached

Idle time (it) is 29,800%, steal time (st) is -237,844,987,904.0% and the server appears to struggle with our monitors claiming that the server is down.

Does anybody have any insight into what is happening here, average load is 0.01 and our processes are not doing much at all so I don't think we are causing the issue.

Could this be the EC2 host machine being overloaded?

user1167223
  • 171
  • 5
  • 15
  • 2
    EC2 instances give you dedicated resources -- you aren't competing with neighbors on the same host for CPU time, so it should not be possible for an overload to have this impact. Stopping the instance and starting it again (via the console) will move it to a different physical host within the availability zone, if you want to try that. Or, first, just reboot it, to verify whether this disappears, due to some kind of strange problem with an in-memory data structure. What is the instance type? Which Linux distro? Are the kernel and packages up to date? – Michael - sqlbot Jun 06 '18 at 10:46
  • Thanks, a reboot seems to have cleared it for now. We will keep an eye on it. – user1167223 Jun 06 '18 at 15:37

0 Answers0