4

I have noticed that some Linux based virtual machines become gradually slower until I reboot them. At first I suspected hacking and resource abusing but after several vm rebuilds from the scratch and being extra ultra careful security wise, I have ruled the hacking out. Now I'm suspecting that ESX hosts do not have enough RAM and are swapping it for the less busy VMs.

How could I verify whether this is the case or not?

Specs:

ESX#1: ESX 3.5 8x3GHz, 32GB RAM. 7 vms
ESX#2: ESX 3.5 8x3GHz, 32GB RAM. 25 vms
70GB configured guest RAM in total over all the vms.

ESX1 has occasionally an alert that memory is getting low but it's only 24GB of 32GB.

sysadmin1138
  • 133,124
  • 18
  • 176
  • 300
Henno
  • 1,056
  • 5
  • 19
  • 33

3 Answers3

3

I've only got v4.1u1 hosts to hand but on the VIClient.exe look on the host summary page under resources - it should say how much memory is being used on that host.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
  • This is where I checked that it's using 24GB of 32GB. I understand that this means that 8GB is still free? – Henno Aug 12 '11 at 13:22
2

It doesn't seem as though you have a RAM resource problem on your hosts. It may be an issue with your individual VMs.

Looking inside of those specific linux VMs, can you see if you're swapping at all?
How have you configured those VMs in terms of RAM?
Are you using resource pools at all?

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • Can't see swapping (free -m). I have configured them to have plenty of RAM (2GB for a stripped down LAMP server used by me and 2 developers). Not using resource pools. – Henno Aug 15 '11 at 22:58
2

This could be a resource problem on the host, vm or both. Check performance tab in the VI client. Switch to memory. Does it use swap or ballooning? Now switch to disk and select performance. (requires 4.1) What is you latency in ms? Everythin less than 10ms is fine. SATA disks can spike at 300ms or worse. At last try running htop from the Linux VM. Does it swap? How many vcpu's do you use on each host. The host will waste a lot of cycles if you add to many vsmp VM's to the host. SSH into the host and run esxtop. Look at the number rdy. This is the time the VM is waiting for CPU access. if this is more than 10 you could have a problem. By the way you should upgrade to 4.1. 3.5 has huge performance problems.

atmorell
  • 201
  • 1
  • 3
  • Thank you for such a through summary of things to watch for! On the slow vm I just checked, I have swap disabled and quite much memory seems to be free: http://pastie.org/2377513. I checked it's performace tab and it used ballooning only for 2 minutes in the past hour and memory usage was under 5%. Memory swap in and out latest/average were 4-5 million (megabytes?) each. Can't tell you latencies (on 3.5). Htop: http://pastie.org/2377588. Would I have to manually count vcpus? RDY is around 1. – Henno Aug 15 '11 at 22:53
  • You are welcome :) how much ballooningn did the host use? You are running out of memory on the host. Memory and CPU looks good on the Linux VM though. Could you SSH into the ESX server and run esxtop. vCPU is the number of virtual cpu's you are using on the host. If all your 25 VM's on host 2 has 2 cpu's each then you are using 50 vcpu's on 8 cores. Might be okay if there is no load on any of the servers. Right now it could be to many vcpu's or your datastore is overloadedm with iops. – atmorell Aug 15 '11 at 23:13
  • There was a big spike but only for 1-2 minutes in the middle of 60 minute period (in the real time view). Today there is none but the machine is fast too (after nightly reboot which is to alleviate this problem). Esxtop from yesterday: http://pastie.org/2379123. There are 21 VMs on that host. 3 of them have 2 vCPUs and 2 of them have 4 vCPUs. I should probably check their load while the machines are slow. Is there any way to verify datastore load without purchasing expensive EMC Navisphere Analyzer? – Henno Aug 16 '11 at 07:05