3

I have a Citrix XenServer 6.1 installed on a HP Server (1x Intel Xeon E5630 @ 2.53GHz (4 cores, 8 threads, RAM: 752MB for dom0 and 55286MB for the guests). The server has 2 scsi mirrored disks and is used for dev/testing.

The host runs 5 guests that I powered on or off during the tests. Most of the perf tests gave poor results. But I'd like to tune Xen: for instance, a "dd if=/dev/zero ..." gives 130MB/s on the host, but only 75MB/s on a lone guest.

If I run a dd if=/dev/urandom on a guest, the CPU is saturated and I get 7MB/s.

Via tune-vcpus, I managed to give 8 vCPUs to dom0 and I got less perfs. The default is 4 vCPUs and I'd like to give dom0 only 1 vCPU but it doesn't work.

Here is what I tried:

  • Edit /etc/sysconfig/tunes-vcpus > NR_DOMAIN0_VCPUS=1 and MAX_NR_DOMAIN0_VCPUS=1
  • /etc/init.d/tune-vcpus start min
  • reboot the host

I also tried

/opt/xensource/libexec/xen-cmdline -set-dom0 blkbk.reqs=256

to get more perf from the storage it doesn't change anything.

And I enabled Qos on the VBDs and gave the highest priority to the one where I do my tests.

After all that, I don't get any gain in I/O performance. Is there anything else to do?

rubo77
  • 2,469
  • 4
  • 34
  • 66
Icu
  • 1,425
  • 3
  • 16
  • 25

3 Answers3

1

You can try to find a good value for max_sectors_kb. By default it set to 512 or 1024. For example you can set it to 128 and test again (dom0 as well as domU).

echo 128 > /sys/block/[your blockdevice]/queue/max_sectors_kb 

This setting is not persistent. Put an entry into /etc/rc.local to set it on startup.

Please post your results.

Striker_84
  • 449
  • 2
  • 6
  • I did it on the host and on the guest + "dd if=/dev/zero ..." -> Same result: host (130MB/s) , guest (75MB/s) – Icu Apr 24 '13 at 09:48
  • ok. recent XenTools installed on the guest? Could please do another test with Citrix Performance VM as well? (http://support.citrix.com/article/CTX127065) – Striker_84 Apr 24 '13 at 10:40
  • Yep, the Xen tools are up to date. I ran a few tests with the Performance VM. Random write with 4MB blocks: 110-135MB/s. With 4KB blocks (like my dd tests): 0-13MB/s – Icu Apr 24 '13 at 11:52
  • The Performance VM results were so good I ran some more tests (dd if=/dev/zero ...) with different types of filesystem and mount options: ext4 now gives an average speed that goes from 119 to 180MB/s, ext3: 267-294MB/s, ext2: 266-294MB/s. I want to keep journaling on the filesystems, so ext3 seems to be the right one. But this doesn't change anything when I test it with 'dd if=/dev/urandom ...'. The guest has 2 vCPUS and 'dd' saturates one of them. I guess there is no way to give my guest more CPU resource, but do you know any setting I could tweak? – Icu Apr 25 '13 at 09:02
  • you can set the cpu priority for this VM (xencenter), but I don't think that would change a lot. I wouldn't change the dom0 vcpus count, leave it to the defaults. Please add more ram to dom0 (see http://support.citrix.com/article/CTX126531) and disable any processor c-states (see http://support.citrix.com/article/CTX127395 and http://support.citrix.com/article/CTX130464 for this). – Striker_84 Apr 25 '13 at 18:15
  • @lcu The problem is that /dev/urandom is quite slow. That it the bottleneck. – Kai Zhang Jul 29 '17 at 05:16
1

It sounds like you are referring specifically to storage throughput from a guest. This is only one amongst many metrics for performance. I/O throughput is a function of bandwidth and latency. This means that, in order to achieve high throughput, you need to minimise latency and maximise bandwidth (i.e. have a lot of data flying at any time, and having the requests for that data being served as fast as possible).

When you are in a virtualised environment, you will inevitably have added latency to serve your requests. That means it will be very hard for a domU to match the throughput you see in dom0. Hopefully, this impact is minimised by allowing more data to fly at any time (either by having many VMs doing I/O or by having large enough requests).

Given your hardware and the rates you are referring to (~150 MB/s), I would be very surprised if you cannot see a similar throughput from a guest given your dd has the correct parameters.

Give this a go from your "dom0" and from your "domU" (the command below will write 500MB of data to delete.me:

dd if=/dev/zero of=delete.me bs=1M count=500 oflag=direct

The oflag=direct ensure that these writes will bypass the VMs (dom0 and domU) buffer caches.

Also, refer to the following document to better understand how XenServer 6.1.0 implements virtualised storage and any of its performance implications (including tuning advice for number of vCPUs and pinning):

http://support.citrix.com/article/CTX136861

Dave M
  • 4,514
  • 22
  • 31
  • 30
1

One thing you can try adding is switching the io scheduler in the vm to deadline, and actually disable io merges there. That should reduce io latency in general and it might work better with Xen's IO ring structure.

echo    1       >       /sys/block/$dev/queue/nomerges 

My reasoning, and experience being that Linux's block code isn't really smart but thinks that it is. So it'll try to ideally merge all IO requests into one before passing them to the disk driver. In Xen's case, the next action will be to break them apart so they fit in the ring buffer. Then they are in dom0 and the disk driver there is in a much, much better position to find out what needs to be merged since it doesn't need to take a single lone VMs knowledge for the alignment.

If it doesn't help, then undo the change, but it's where i'd look when everything else was already tried.

Maybe your server has been abandoned long ago, but I this will still help someone else.

Florian Heigl
  • 1,479
  • 12
  • 20