4

I have a debian VM I'm running on an ESX4.0 server. This VM hosts a number of users, each running an irssi session inside a screen instance.

This is working quite well, except for one user. For some reason, this irssi session keeps peaking at 100% CPU usage (while continuing to work normally). It's not running any scripts that aren't loaded on other irssi sessions that are behaving more sanely.

The 100% cpu doesn't start immediately, but usually within a few hours after starting up. It never goes away.

How would you go about debugging the source of this problem? I tried using strace on it, and didn't see anything immediately obvious, though there is certainly a different pattern to the calls at start, and after it peaks.

At start, here's the histogram of calls over 30 seconds:

time: 334
gettimeofday: 317
poll: 122
read: 9
write: 2
restart_syscall: 1

Once the CPU starts pegging, I get this:

gettimeofday: 230176
read: 115122
poll: 115106
time: 531
write: 107
waitpid: 38
_llseek: 2
ioctl: 2
fstat64: 2
open: 2
close: 2
fcntl64: 2
unlink: 1

A histogram of 'ltrace -S' of the pegging process shows these as the top entries:

SYS_read: 61731
g_io_channel_read: 34115
SYS_gettimeofday: 24662
SYS_poll: 12344
fflush: 6828
g_main_context_iteration: 6823
__ctype_toupper_loc: 4025
g_strcasecmp: 3757
g_hash_table_lookup_extended: 3325
g_direct_hash: 3068

What am I missing? What's the next step to solving this?

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
zigdon
  • 471
  • 1
  • 4
  • 8

1 Answers1

1

I think you need to figure out what it's read()ing and poll()ing so much. Since it isn't constantly opening and closing new files — only two every 30s, apparently — lsof should tell you that.

If it's read()ing from a pipe, as here:

COMMAND    PID       USER   FD      TYPE             DEVICE     SIZE   NODE NAME
irssi     4993       user    5w     FIFO                0,6         2941908 pipe

then run lsof as root against all processes and grep for the pipe, either the name of a named pipe, or the node number of an unnamed pipe, in its output. (In this case, 2941908.) This should show you irssi and whatever process is on the other end of the pipe.

If the pipe doesn't have another end … uh, I'm not sure. Maybe strace one of the processes from start until the problem happens and figure out what's going on with the pipe. It might make sense to limit the output with a '-e trace=' flag to strace, but I can't think of a good set of things to restrict to off the top of my head.

wfaulk
  • 6,878
  • 7
  • 46
  • 75
  • All the read() calls are on FD5, which lsof shows as: COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME irssi 4993 user 5w FIFO 0,6 2941908 pipe I don't know how to tell what's on the other end of the pipe though? – zigdon Sep 24 '09 at 19:00
  • No luck :/ irc$ sudo lsof | grep 2941908 irssi 4993 user 5w FIFO 0,6 2941908 pipe – zigdon Sep 24 '09 at 19:58
  • Might be noted that none of the other irssi sessions have *any* pipes open, only this one user. – zigdon Sep 24 '09 at 19:59
  • Strace is the way to go, but you don't need to do it in advance, you can attach afterwards. – Joris Aug 15 '10 at 08:16