2

This isn't a real problem, but I guess it can point to something more serious - I recently upgraded to 2.6.36 linux kernel, and Load Average doesn't go under 1.0 - no matter how many tasks I have, no matter that CPU load is 0% and there are no processes waking up.

I wonder what could be causing this, and, well, if there's some nice way to debug this "problem".

I'm hoping that it won't lead to anything more serious (like some silent piece of kernel causing wakeups). The only problem that it makes now is probably that the 1.0 'bottom' doesn't look very healthy on graphs.

Can this be caused by tickless kernel?

Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
exa
  • 571
  • 4
  • 14

1 Answers1

2

I've seen this with weird things that are stalled on the scheduler, usually in a system call. If you have non-vanilla kernel modules, definitely start there, even if they are included in the kernel tree. Kernel elements with a user-space element are one way to describe this and you may find that the user-space daemon is hanging on an external event, which hangs the kernel step in-between, which hangs a program asking questions of the kernel.

Network-based filesystem, and not just those that communicate over Ethernet, are prime suspects.

Check for processes not in the runnable state with ps -eo user,pid,stat,pcpu,args | grep -v " R"

USER       PID STAT %CPU COMMAND
daemon     676 Ss    0.0 portmap
statd      752 Ss    0.0 rpc.statd -L
syslog     872 Sl    0.0 rsyslogd -c4
102        895 Ss    0.0 dbus-daemon --system --fork
avahi      934 S     0.0 avahi-daemon: running [faustus.local]
daemon    1082 Ss    0.0 atd

And you can decode the status from this table taken from the ps man page.

  D    Uninterruptible sleep (usually IO)
  R    Running or runnable (on run queue)
  S    Interruptible sleep (waiting for an event to complete)
  T    Stopped, either by a job control signal or because it is being traced.
  W    paging (not valid since the 2.6.xx kernel)
  X    dead (should never be seen)
  Z    Defunct ("zombie") process, terminated but not reaped by its parent.

  For BSD formats and when the stat keyword is used, additional characters may be displayed:
  <    high-priority (not nice to other users)
  N    low-priority (nice to other users)
  L    has pages locked into memory (for real-time and custom IO)
  s    is a session leader
  l    is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
  +    is in the foreground process group
zerolagtime
  • 1,428
  • 9
  • 10
  • +1 lots of useful information, thanks for sharing this is going right into my own notes. – Oneiroi Nov 15 '10 at 11:05
  • Gonna check all of that once more, but I'm afraid it doesn't really solve the problem; I've already checked them all. Good tip on the filesystem thingy, and I realized that I have a rather old udev. Gonna try upgrading, will see after that. – exa Nov 16 '10 at 07:43