0

I have an Amazon EC2 t2.medium instance that is showing very strange CPU Steal Time values, mostly large negative numbers and also very high idle CPU numbers.

Anything that explains such strange numbers? Any system update/bugfix that we are missing here?

top - 13:36:23 up 51 days,  2:49,  1 user,  load average: 0.35, 0.15, 0.12
Tasks:  97 total,   1 running,  96 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,56000.0%id,200.0%wa,  0.0%hi,  0.0%si,-3849124577280.0%st
Mem:   4047964k total,  3905488k used,   142476k free,    29760k buffers
Swap:        0k total,        0k used,        0k free,   269332k cached
centic
  • 221
  • 3
  • 12

1 Answers1

0

I think I found it myself, it seems this is a bug in the area of Kernel/Xen/Kvm, it happens since Linux Kernel 4.8 and was fixed with Linux Kernel 4.11. We run 4.9.x, so we are affected by this, however it is not an actual steal situation here after all, but just incorrect reporting due to an number-overflow inside the kernel.

See https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest/ for a very nice writeup.

centic
  • 221
  • 3
  • 12