1

I've this server which is basically idle, and yet has a high load average.

  • Hardware : 4 processor PowerPC
  • Over 4GB of free RAM
  • Top says CPUs are 99.9% idle
  • Virtually no disk I/O
  • Debian Squeeze, default install except I'm using ext4

Here's the output of a few commands :

uname -a

Linux box 2.6.32-5-powerpc64 #1 SMP Tue Mar 8 02:01:42 UTC 2011 ppc64 GNU/Linux

top

top - 14:08:57 up  1:58,  1 user,  load average: 2.68, 2.45, 2.29
Tasks: 105 total,   1 running, 104 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4987256k total,  4965484k used,    21772k free,    16540k buffers
Swap: 24414028k total,        0k used, 24414028k free,  4781172k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2606 myself    20   0  3276 1340 1076 R    0  0.0   0:00.62 top
    1 root      20   0  2560  844  740 S    0  0.0   0:00.65 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0
    4 root      20   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0  

uptime

 14:09:23 up  1:58,  1 user,  load average: 2.54, 2.43, 2.28

iostat -d 2 -m

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb               0.00         0.00         0.00          0          0
sda               1.50         0.00         0.02          0          0

free -m

             total       used       free     shared    buffers     cached
Mem:          4870       4853         17          0         16       4669
-/+ buffers/cache:        167       4702
Swap:        23841          0      23841

ps axf

  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:00  \_ [migration/0]
    4 ?        S      0:00  \_ [ksoftirqd/0]
    5 ?        S      0:00  \_ [watchdog/0]
    6 ?        S      0:00  \_ [migration/1]
    7 ?        S      0:00  \_ [ksoftirqd/1]
    8 ?        S      0:00  \_ [watchdog/1]
    9 ?        S      0:00  \_ [migration/2]
   10 ?        S      0:00  \_ [ksoftirqd/2]
   11 ?        S      0:00  \_ [watchdog/2]
   12 ?        S      0:00  \_ [migration/3]
   13 ?        S      0:00  \_ [ksoftirqd/3]
   14 ?        S      0:00  \_ [watchdog/3]
   15 ?        S      0:00  \_ [events/0]
   16 ?        S      0:00  \_ [events/1]
   17 ?        S      0:00  \_ [events/2]
   18 ?        S      0:00  \_ [events/3]
   19 ?        S      0:00  \_ [cpuset]
   20 ?        S      0:00  \_ [khelper]
   21 ?        S      0:00  \_ [netns]
   22 ?        S      0:00  \_ [async/mgr]
   23 ?        S      0:00  \_ [pm]
   24 ?        S      0:00  \_ [sync_supers]
   25 ?        S      0:00  \_ [bdi-default]
   26 ?        S      0:00  \_ [kintegrityd/0]
   27 ?        S      0:00  \_ [kintegrityd/1]
   28 ?        S      0:00  \_ [kintegrityd/2]
   29 ?        S      0:00  \_ [kintegrityd/3]
   30 ?        S      0:00  \_ [kblockd/0]
   31 ?        S      0:00  \_ [kblockd/1]
   32 ?        S      0:00  \_ [kblockd/2]
   33 ?        S      0:00  \_ [kblockd/3]
   38 ?        S      0:00  \_ [khungtaskd]
   39 ?        S      0:04  \_ [kswapd0]
   40 ?        SN     0:00  \_ [ksmd]
   41 ?        S      0:00  \_ [aio/0]
   42 ?        S      0:00  \_ [aio/1]
   43 ?        S      0:00  \_ [aio/2]
   44 ?        S      0:00  \_ [aio/3]
   45 ?        S      0:00  \_ [crypto/0]
   46 ?        S      0:00  \_ [crypto/1]
   47 ?        S      0:00  \_ [crypto/2]
   48 ?        S      0:00  \_ [crypto/3]
  134 ?        S      0:00  \_ [ksuspend_usbd]
  135 ?        S      0:00  \_ [kmmcd]
  137 ?        S      0:00  \_ [ata/0]
  138 ?        S      0:00  \_ [ata/1]
  139 ?        S      0:00  \_ [ata/2]
  140 ?        S      0:00  \_ [ata/3]
  141 ?        S      0:00  \_ [ata_aux]
  142 ?        S      0:00  \_ [scsi_eh_0]
  143 ?        S      0:00  \_ [scsi_eh_1]
  144 ?        S      0:00  \_ [scsi_eh_2]
  145 ?        S      0:00  \_ [scsi_eh_3]
  150 ?        S      0:00  \_ [khubd]
  174 ?        S      0:00  \_ [usbhid_resumer]
  227 ?        D      0:00  \_ [kwindfarm]
  239 ?        S      0:00  \_ [jbd2/sda3-8]
  240 ?        S      0:00  \_ [ext4-dio-unwrit]
  241 ?        S      0:00  \_ [ext4-dio-unwrit]
  242 ?        S      0:00  \_ [ext4-dio-unwrit]
  243 ?        S      0:00  \_ [ext4-dio-unwrit]
  424 ?        S      0:00  \_ [nouveau/0]
  425 ?        S      0:00  \_ [nouveau/1]
  426 ?        S      0:00  \_ [nouveau/2]
  427 ?        S      0:00  \_ [nouveau/3]
  459 ?        S      0:00  \_ [phy0]
  474 ?        S      0:00  \_ [flush-8:0]
  493 ?        S      0:00  \_ [ttm_swap]
  588 ?        S      0:00  \_ [bluetooth]
  635 ?        S      0:00  \_ [firewire_sbp2]
  693 ?        S      0:00  \_ [jbd2/sda5-8]
  694 ?        S      0:00  \_ [ext4-dio-unwrit]
  695 ?        S      0:00  \_ [ext4-dio-unwrit]
  696 ?        S      0:00  \_ [ext4-dio-unwrit]
  697 ?        S      0:00  \_ [ext4-dio-unwrit]
 1694 ?        S      0:02  \_ [jbd2/sdb1-8]
 1695 ?        S      0:00  \_ [ext4-dio-unwrit]
 1696 ?        S      0:00  \_ [ext4-dio-unwrit]
 1697 ?        S      0:00  \_ [ext4-dio-unwrit]
 1698 ?        S      0:00  \_ [ext4-dio-unwrit]
    1 ?        Ss     0:00 init [2]  
  303 ?        S<s    0:00 udevd --daemon
  368 ?        S<     0:00  \_ udevd --daemon
 1385 ?        S<     0:00  \_ udevd --daemon
  929 ?        Sl     0:00 /usr/sbin/rsyslogd -c4
  998 ?        Ss     0:00 /usr/sbin/atd
 1042 ?        Ss     0:00 /usr/sbin/cron
 1255 ?        Ss     0:00 /usr/sbin/exim4 -bd -q30m
 1286 tty2     Ss+    0:00 /sbin/getty 38400 tty2
 1287 tty3     Ss+    0:00 /sbin/getty 38400 tty3
 1288 tty4     Ss+    0:00 /sbin/getty 38400 tty4
 1289 tty5     Ss+    0:00 /sbin/getty 38400 tty5
 1290 tty6     Ss+    0:00 /sbin/getty 38400 tty6
 1300 ?        Ss     0:00 dhclient -v -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
 1384 tty1     Ss+    0:00 /sbin/getty 38400 tty1
 2113 ?        Ss     0:00 /usr/sbin/apache2 -k start
 2116 ?        S      0:00  \_ /usr/sbin/apache2 -k start
 2118 ?        Sl     0:00  \_ /usr/sbin/apache2 -k start
 2119 ?        Sl     0:00  \_ /usr/sbin/apache2 -k start
 2577 ?        Ss     0:00 /usr/sbin/sshd
Ecco
  • 121
  • 2
  • 7

3 Answers3

2

Try to upgrade/downgrade your kernel. Are several problems with the sheduler on different kernels:

  1. http://lkml.indiana.edu/hypermail//linux/kernel/1010.1/02354.html
  2. http://www.gossamer-threads.com/lists/linux/kernel/1169420
Sacx
  • 2,581
  • 16
  • 13
  • Hmm, that's interesting. Thing is, I'm running Debian stable, so I find it weird that such a noticeable bug could have slept through – Ecco Mar 24 '11 at 14:04
  • It seems it appears only sometimes in some hardware configurations ... is very hard to detect such bugs. – Sacx Mar 24 '11 at 14:06
  • Agree. Considering your arch. – Dmytro Leonenko Mar 24 '11 at 14:21
  • Agree as well, though I'll also add that if your system's performance isn't impacted you may want to play "Ignore it and it will go away" with this -- Definitely report the issue to Debian and see if they can help get a fix into the main kernel tree though. – voretaq7 Mar 24 '11 at 14:48
2

I just installed Ubuntu on my Quad G5, and I started noticing the exact same issue as well with 2.6.35-28-powerpc64-smp (kernel from Ubuntu 10.10). My userland is up-to-date with Ubuntu 11.04, but the kernel is from 10.10 due to bugs in the newer kernel.

Running top in batch mode, the only thing I ever see in wait is kwindfarm. Run 'top -b -i' for a while... do you see the same thing? My hunch is that kwindfarm is the problem, but I don't want to go messing with kwindfarm and cause the fans to turn on full blast which would annoy/confuse my officemate since I'm remote right now.

Here are my list of suspect kernel modules ... try removing them and seeing if the problem goes away:

windfarm_smu_sensors 8567 1 windfarm_smu_controls 7645 8 windfarm_pm112 17416 0 windfarm_smu_sat 8512 9 windfarm_pm112,[permanent] windfarm_max6690_sensor 5628 1 windfarm_lm75_sensor 6083 1 windfarm_pid 3577 1 windfarm_pm112 windfarm_cpufreq_clamp 3829 1 windfarm_core 16091 7 windfarm_smu_sensors,windfarm_smu_controls,windfarm_pm112,windfarm_smu_sat,windfarm_max6690_sensor,windfarm_lm75_sensor,windfarm_cpufreq_clamp

EDIT: This is the likely suspect. A bit more googling turned up this thread from lkml: http://www.gossamer-threads.com/lists/linux/kernel/860721

  • That is very interesting. However your LKML thread is really old, so I doubt this is it since I hadn't this issue in the previous Debian version (which had a kernel more recent than your link). Out of curiosity, what's that bug in the newer kernel ? – Ecco Mar 24 '11 at 20:16
  • The new kernel bug I am referring to is in these two reports: https://bugs.launchpad.net/ubuntu/+source/linux-ports-meta/+bug/739913 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/714165 – Jeremy Huddleston Sequoia Apr 04 '11 at 02:30
  • Your info seems to indicate that it is kwindfarm causing the load. What kernel did you have in which you didn't see this problem? Does it go away when you unload the windfarm modules? – Jeremy Huddleston Sequoia Apr 04 '11 at 02:32
0

I ran into this problem as well, and the culprit is the watchdog module you have enabled. I'm assuming it's a software watchdog, rather than a hardware one; and in theory, it's a nice idea, but in practice it's totally useless. if you really need a watchdog, get a hardware one; one that can actually reboot the box if needed, since a software one will stop working if the box freezes/panics.

adeel
  • 1