0

Every night we run hwinfo to dump the current hardware infos into a file.

Today we had a constant load of 5:

top - 13:24:32 up 5 days, 19:35,  3 users,  load average: 5,08, 5,20, 5,31
Tasks: 258 total,   1 running, 257 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,0 us,  0,2 sy,  0,1 ni, 99,3 id,  0,4 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:   8174748 total,  7394184 used,   780564 free,   427664 buffers
KiB Swap:  2097148 total,   856972 used,  1240176 free,  5193064 cached

There were 5 hwinfo commands running. One for this night, one for yesterday, one for the night before yesterday ....

I could not see these commands with top or iotop.

Which tool instead of top could be used, so that I can see the hanging hwinfo processes the next time?

After killing the hwinfo processes to load settled immediately.

If you know why the load created by the hwinfo command is not visible in top, then please leave a comment.

Thank you.

guettli
  • 3,591
  • 17
  • 72
  • 123

1 Answers1

1

top sorted by CPU may not find something stuck in I/O or something and doing little CPU. Try hitting i in top to enable the idle filter hiding states of sleeping or idle.

As to hwinfo, tricky without seeing what it is caught on. If you see this again, try tracing the process to see what it is doing. One way: ltrace -S -o /tmp/hwinfo -p $(pidof hwinfo) This last file it read may be interesting, for example.

Load average is simply proportional to the number of tasks in a ready to run state. It does not create load; these may be doing a lot or a little. However, a high load average usually indicates some resource shortage or contention. Whether that is a symptom of an actual problem is up to you. There were 5 running before you noticed.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • Today hwinfo was hanging again. I did `ltrace -S -o /tmp/hwinfo -p 20089` and then ltrace was hanging, too. The file "/tmp/hwinfo" was empty. I tried to kill the ltrace process, but this failed. Only "kill -9" terminated the ltrace process. – guettli Nov 13 '18 at 09:00
  • Review syslog and dmesg for anything interesting. Consider calling your hwinfo via ltrace so it attaches immediately: `ltrace -S -o /tmp/hwinfo hwinfo` – John Mahowald Nov 13 '18 at 13:51