1

I'm not great with linux, but I do enjoy the limited exposure I have. We use a virtualised red hat server for software development, and one of the things I have to do is check who's hogging the CPU and see how we can try and avoid it. I can decipher the users and processes easily enough, but if the system column is using (say) 50.8% of CPU, how can I find out what it's doing, and try and minimise it? I gather it's all internal kernel stuff, but it seems to run at at least 20%, and usually higher. Here's a sample output:

245 processes: 232 sleeping, 8 running, 4 zombie, 1 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   47.9%    0.0%   50.8%   0.3%     0.7%    0.0%    0.0%
           cpu00   68.9%    0.0%   29.2%   0.3%     1.3%    0.0%    0.0%
           cpu01   26.8%    0.0%   72.5%   0.3%     0.1%    0.0%    0.0%
Mem:  3816924k av, 3795652k used,   21272k free,       0k shrd,  266548k buff
                   2836408k actv,  541432k in_d,   58992k in_c
Swap: 2097136k av,       0k used, 2097136k free                 3080380k

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
23447 romanmur  25   0  6896 6896   356 R    33.4  0.1   0:09   1 bzip2
23961 tomjose   25   0  1076 1076   960 S     5.4  0.0   0:00   1 make_headers.
23887 paulhewl  25   0  3996 3996   348 R     4.4  0.1   0:00   0 bzip2
 3902 romanmur  15   0  4112 4108  1628 S     2.1  0.1   0:23   0 smbd
23446 romanmur  15   0   708  708   616 S     0.5  0.0   0:00   1 tar
24541 damianwy  24   0  1776 1776   960 S     0.4  0.0   0:00   0 sh
24493 root      24  -1  1372 1372  1100 R <   0.3  0.0   0:00   1 X
 1771 root      15   0  1108 1108   496 S     0.2  0.0  14:26   1 cmaperfd
23262 root      15   0  1308 1308   884 R     0.2  0.0   0:03   1 top
15209 paulhewl  15   0   488  488   428 S     0.2  0.0   0:00   1 tee
 2719 richardp  15   0  6108 6104  2108 S     0.1  0.1   1:02   0 smbd
23857 paulhewl  22   0  1900 1900   652 S     0.1  0.0   0:00   1 make
23886 paulhewl  19   0   712  712   624 S     0.1  0.0   0:00   0 tar
24431 root      23   0  1052 1052   936 S     0.1  0.0   0:00   1 startx
24521 paulhewl  24   0  1056 1056   896 S     0.1  0.0   0:00   1 sh
24523 paulhewl  25   0   416  416   344 R     0.1  0.0   0:00   1 mips-linux-gc

Thanks!

donrosco
  • 130
  • 5

1 Answers1

3

I sympathise with the predicament.

top is wonderful but also a little tricky to understand, mainly because a system is so complex. If you have a dual core CPU for example the CPU running at 100% in top for one command isn't actually a problem usually - it just means that one command is using one core fully but the other core should be able to take over and serve other commands without delay.

A good starting place for system load (at a glance) is the top-most three numbers on the right (type w if your top doesn't show this) from the w man page system load averages for the past 1, 5, and 15 minutes (the same applies to top). They're basically how long it will be before the system can being to process the next command in the queue ... 1.01 seconds, 3.26 seconds etc. To begin with I'd pay more attention to those values than the other screeds of detail to tell you when your system is busy or not.

Your output above shows that bzip2 is the highest user of CPU at the time you took the snapshot.

I'd recommend some further reading on interpreting a system's IO, RAM and CPU monitoring tools.

Here's some starting points to hopefully ease the hunting: Top columns, SysStat and other tools and Using top more efficiently.

Jonathan Ross
  • 2,183
  • 11
  • 14
  • 1
    The load average is the number of processes in the run queue, not the number of seconds until the next command can process. – Cakemox Apr 12 '11 at 10:27
  • @Cakemox - You're absolutely right. I've definitely read that (in a printed book) recently but after some looking `They represent the average number of active tasks in the last 1, 5 or 15 minutes.` is the simplest way of expressing it. Here's a deeper explanation for those interested: `http://juliano.info/en/Blog:Memory_Leak/Understanding_the_Linux_load_average` – Jonathan Ross Apr 12 '11 at 10:36
  • No problem at all. Glad it helped. – Jonathan Ross Apr 14 '11 at 09:07