IBM SR-BR10i RAID Controller too slow with VMware ESXi

Question

I'm transferring a virtual machine from one ESXi host to another using SSH on both ESXi Servers.

But it's painful slow, it's a massive 750GB .vmdk disk image, the VM is stopped (with downtime) and with 5.5MB/s speed ratio it will take more than one day com complete the move.

Am I missing something?

myvm.mydomain.com-flat.vmdk                 26%  200GB   5.4MB/s 29:09:06 ETd

Important hardware on the ESXi Server:

Supermicro X9SCM-F
Intel 82574L Gigabit Controller
IBM SR-BR10i RAID Controller
2x WD Velociraptor WD1000DHTZ (RAID1 mode from controller)

Another point: I've built the and synchronised the RAID array before start VM migration.

Thanks for any help,

the-wabbit · Accepted Answer · 2013-07-19T06:54:35.297

5

If you think the copy is slow, you have not seen the VM run yet. The major trouble is that your controller is missing a BBU and ESXi is doing many synchronous writes (where the controller's or disk's write cache which otherwise might have been of use is circumvented to ensure data consistency).

Add a BBU (if available as an option) or replace the controller with a model using BBWC/FBWC. Or, if you do not care about the integrity of the data (mind that this might result in losing the entire data store if your host loses power in an untimely moment), you could enable the write-back cache even for synchronous writes using lsiutil. Some guy has even done a compile for ESXi, so you likely would not even need to reboot to another OS to try it out.

Other than that, ESXi-internal scp/cp operations are rather slow, you should pick a different approach:

For performance and data placement reasons, do not use scp or cp; instead, use vmkfstools, the Virtual Machine Importer tool from VMware, or the SDK APIs to manipulate your virtual disks. You should see very significant performance improvements if you use the recommended tools.

If you can't go with one of the mentioned tools, consider Veeam's FastSCP which is also meant to improve SCP copy performance.

edited Jul 19 '13 at 06:54

answered Jul 19 '13 at 05:51

the-wabbit

40,737
13
111
174

Hello Syneticon. Thanks for your thoughts. About the controller: we don't have the money to buy other one, and I think this controllers can't support BBWC. So I'm thinking in a workaround: we have some APC NoBreaks with management, can we use this guys to safely poweroff the servers in a eletrical failure? It will be OK to enable the write-back feature? – Vinícius Ferrão Jul 19 '13 at 14:34
A BBU to enable write-back will help immensely with a random write workload, not so much with a 1TB mostly-sequential write. – MikeyB Jul 19 '13 at 14:47
@MikeyB with ESXi, it will. – the-wabbit Jul 19 '13 at 15:49
@ViníciusFerrão there surely will still be failure modes where it won't be safe (UPS defective, server cable pulled, shutdown not completing in time, ...). But it should be better than nothing. – the-wabbit Jul 19 '13 at 15:53
Ok... perhaps a paradigm change will help with my issue? We can build a FOSS NAS appliance like FreeNAS with a lot of drives and we got two 128GB SSDs for caching, so it can work as an SAN over iSCSI to feed the VMware ESXi needs with a gigabit connection? We already have this material. – Vinícius Ferrão Jul 19 '13 at 16:32
@ViníciusFerrão this sounds like it might work out, but I obviously have no idea if it will suit your needs. Also, I do know nothing about FreeNAS, so I cannot tell if its caching policy can use SSDs for writing and is transactionally safe for synchronous writes under any circumstances. – the-wabbit Jul 19 '13 at 16:56
Thanks @syneticon-dj; it can be done... My question was if the iSCSI through the Gigabit Internet will give us better performance than the slow controllers. – Vinícius Ferrão Jul 19 '13 at 19:42
@ViníciusFerrão iSCSI with the software initiator on ESXi works decently over GE, especially if it has a separate channel for iSCSI traffic. Bear in mind though that you *will* be getting higher storage latency compared to local storage, no matter what you do. But from your current setup and performance numbers, it does not seem like a major concern. – the-wabbit Jul 19 '13 at 21:33

Chopper3 · Answer 2 · 2013-07-19T07:17:12.390

2

ESXi is not designed to be used as a general purpose *nux, VMWare rarely encourage use of the command line and when they do it's for specific tasks. As such the command line interface is fairly heavily resource depleted, both in terms of memory, IO and CPU share. You are treating it as a general purpose OS by asking it to do something fairly intensive and therefore I'm really not surprised that it's performing badly, it won't be your disk subsystem's fault.

If you use a supported transfer method I'm sure you'll be happier.

Edit - Oh and I've just noticed that those disks are not 100% duty cycle, i.e. they're not designed to be ran 24/7, doing so will significantly increase their likelihood of failure. Were you planning on only running this server for around 12 hours a day?

edited Jul 19 '13 at 07:17

answered Jul 19 '13 at 07:10

Chopper3

101,299
9
108
239

Hello Chopper3, we don't have the required founds to buy enterprise SAS storage, we are trying to do the best with what we have. – Vinícius Ferrão Jul 19 '13 at 14:31
1

Ok, it'll break and you'll leave that lesson but ok. You don't need to buy enterprise SAS disks, just disks designed to work 24/7, that or don't run them 24/7 - either way works but trying to make the cheapest thing do the hardest job will lose you data. – Chopper3 Jul 19 '13 at 14:33
Chopper3, where you're seeing this specifications? – Vinícius Ferrão Jul 19 '13 at 14:43
http://eshop.macsales.com/item/Western%20Digital/WD1000DHTZ/ footnote 3 - disk manufacturers expressly state when their disks are 100% duty cycle, they don't accidentally leave it out of a spec and are more than happy to advertise this capability for disks that have it. Those disks are for workstations, gamers or non-24/7 servers. – Chopper3 Jul 19 '13 at 14:55
For desktop drives WD assumes a 35.6% duty cycle (but tests at 100% duty cycle): http://www.wdc.com/wdproducts/library/other/2579-001134.pdf. Your VM may not be very busy, so it might make sense for you, but don't expect any sort of transactional performance. – MikeyB Jul 19 '13 at 14:56
The velociraptors are [spec'ed for "low end servers"](http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701282.pdf), so a 100% duty cycle is assumed. The performance is about what you see with 10k 2,5" SAS drives - there is not that much difference in terms of IOPS. They *do* break though and need monitoring so you replace them in time. – the-wabbit Jul 19 '13 at 15:58
@syneticon-dj - WD caveat their MTBF based on DC (see my link above), they only do this for >100% DC disks – Chopper3 Jul 19 '13 at 16:19

score 1 · Answer 3 · answered Jul 22 '13 at 20:18

For a unknown reason (for me), the SCP between EXSi (free) hosts is horribly slow. The workaround for this issue that I use is to transfer with SCP the VM to non-virtual machine and then SCP to the destination ESXi host. It's not very clever, but I manage to change the speed transfer for 5MB/s to 80MB/s. I had the same slow transfer with the Veeam FastSCP, and the vmkstools didn't work for me (I don't have a shared Storage for the guests), so i couldn't think for a better solution.

If someone could explain why SCP between EXSi (Free version) Hosts is so terribly slow, I will be thankful.

IBM SR-BR10i RAID Controller too slow with VMware ESXi

3 Answers3