0

All,

Currently we are running a VM CentOS on our server via VMWare. I am encountering sluggish performance overtime. On intial creation of the server the speed is extreme but, over time it becomes unbareably slow.

I am a bit confused because we arent using any swap and our load is not terrible.

Here is my top output:

top - 15:38:49 up  1:10, 13 users,  load average: 6.94, 6.92, 6.31
Tasks: 165 total,   7 running, 158 sleeping,   0 stopped,   0 zombie
Cpu(s): 50.0%us, 50.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16189104k total, 14704772k used,  1484332k free,    61140k buffers
Swap:  4095992k total,        0k used,  4095992k free,  1201532k cached

The top CPU intensive item is

 PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
 20   0  1969m 1.1g  10m S  2.5  7.4   3:39.35 java 

I am sure it is something silly that I am missing but, at this point it takes 20 seconds to SU to another user.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
XanderLynn
  • 163
  • 1
  • 6

6 Answers6

3

If you have strace installed (yum install strace) can you find a command that is slow (you mentioned su in your post) and run it under strace -cf:

# strace -F -c su - gonzo -c exit
...
Process 3583 detached
Process 3562 resumed
Process 3563 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.10    0.291882        7484        39        18 waitpid
  2.01    0.006160         474        13           execve
  0.77    0.002359          24        98           munmap
  0.75    0.002310         110        21           clone
  0.32    0.000973          24        41           mprotect
  0.19    0.000586           3       194           rt_sigaction
  0.18    0.000556           3       211           read
  0.16    0.000497           2       263           mmap2
  0.15    0.000471          43        11           write
  0.10    0.000301           2       184         2 open
  0.05    0.000151           0       418           rt_sigprocmask
  0.04    0.000119           7        17           getrlimit
  0.04    0.000116           1       157           fstat64
  0.03    0.000101           1        75        23 access
  0.02    0.000065           0       270         5 close
  0.02    0.000061           1        98           fcntl64
  0.02    0.000052           2        23        22 connect
  0.01    0.000034           1        67        17 stat64
  0.01    0.000032           1        25           getuid32
  0.01    0.000031           2        18           sigreturn
  0.01    0.000030           1        37           brk
  0.01    0.000029           7         4           setreuid32
  0.00    0.000000           0         1           chdir
  0.00    0.000000           0         4           time
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0         3           alarm
  0.00    0.000000           0         9           pipe
  0.00    0.000000           0         7           ioctl
  0.00    0.000000           0         1           umask
  0.00    0.000000           0        28           dup2
  0.00    0.000000           0         1           getppid
  0.00    0.000000           0         1           getpgrp
  0.00    0.000000           0         1           setsid
  0.00    0.000000           0         1           setrlimit
  0.00    0.000000           0         8           readlink
  0.00    0.000000           0         1           getpriority
  0.00    0.000000           0         1           setpriority
  0.00    0.000000           0         2           uname
  0.00    0.000000           0         2           _llseek
  0.00    0.000000           0         6           poll
  0.00    0.000000           0         1           getcwd
  0.00    0.000000           0        16           getgid32
  0.00    0.000000           0        16           geteuid32
  0.00    0.000000           0        16           getegid32
  0.00    0.000000           0         4           setregid32
  0.00    0.000000           0         1           setgroups32
  0.00    0.000000           0         1           setuid32
  0.00    0.000000           0         1           setgid32
  0.00    0.000000           0         6           getdents64
  0.00    0.000000           0        11           gettid
  0.00    0.000000           0        13           set_thread_area
  0.00    0.000000           0         3           keyctl
  0.00    0.000000           0        29           socket
  0.00    0.000000           0         2           send
  0.00    0.000000           0         6           sendto
  0.00    0.000000           0        12           recvfrom
------ ----------- ----------- --------- --------- ----------------
100.00    0.306916                  2500        87 total

You'll then be able to see in which system calls the time is being used up which might give us a clue about what is causing the slowness.

strace -tT might also be useful.

You can also attach strace to running processes (strace -p) and find out more about what they are doing.

Question: If you stop all the java processes the load average start to come down?

gm3dmo
  • 10,057
  • 1
  • 42
  • 36
1

Install/update VMware tools. Enable virtualization support in BIOS of the physical server (you will have such option if your CPU supports this). Which virtualization solution from VMware are you using? Check the performance in both the guest (VM) and the host (VMware server) machines. Please specify if top is from guest or from host. How much memory you have in host and how much is assigned from guest? Do you have memory overcommit for VMs? Is the host swapping?

Mircea Vutcovici
  • 17,619
  • 4
  • 56
  • 83
1

You have given your Guest fewer vCPUs than your host machine has haven't you? I suspect your guest has two vCPUs. How many does the host have?

Oversubscribing CPUs can cause this kind of behaviour.

Also, there's an option to reduce the tick rate in VM guest with CentOS that may help somewhat, although I don't think it's the root cause. Look at the first bullet point in section 3 of http://wiki.centos.org/Manuals/ReleaseNotes/CentOS5.1

xenny
  • 800
  • 4
  • 8
0

Try the tools "iostat" and "vmstat". They give you a lot more information about what is happening. Maybe "sar" helps you too. (You need to install the "sysstat" package to get the tools.)

And please print here the output of those programs. Then we could help you more.

Another good thing is to do what "davey" told you.

Raffael Luthiger
  • 2,001
  • 2
  • 17
  • 26
0

I have an issue on a couple of machines running VMWare Server whereby each VM slowly uses more and more CPU resource as time goes on. Stopping the VMs and restarting then solves the problem, though rebooting them or suspending+resuming them doesn't.

This is easiest to see on a low spec server (an old P4) that runs three VMs running basic web services: the graph at the bottom of this page shows the measured effect on CPU use over time and at the bottom of this page you can see the effect as measured by "load average" readings. The effect is much less noticable on the other machines I run VMWare on because they are far more powerful over-all. The effect seems proportional to the number of VMs running (i.e. the ghost load increases twice a quickly if twice as many VMs are running). Thus far, stopping and restarting the VMs has always solved the issue - rebooting the host machine is not needed (though if the host is due a reboot for something like a kernel upgrade it makes sence to coordinate this reboot with the VMs going down).

David Spillett
  • 22,754
  • 45
  • 67
0

50% system cpu usage is very high, especially without any swapping or iowait. You have something kernel-level that is chewing up resources. Most likely a bad driver. I would say first yum-update to a new kernel.

cagenut
  • 4,848
  • 2
  • 24
  • 29