3

Summary: Given the same hardware, Ubuntu outperforms Win2k3 by about 50 percent when it comes to IOPS. Win2008R2 initially performs the same as Win2k3, but it gradually gains performance until it is on par with Ubuntu (takes about 20-30 seconds of continuous activity).

Questions: Can anyone explain the Win2008R2 behavior?

Details: I'll only give details I think are relevant, but hit me up for more if I miss something. We've got a SAN with an 11x600GB SAS RAID5 group. Everything is connected by 4Gbps fiber. I created a 500GB LUN and shared it with a physical server running Win2k3Entx64. I created another 500GB LUN and shared it with a server running ESXi. I created a Win2k3Entx64 VM, an Ubuntu 10.04 x64, and a Win2008R2x64 VM in ESXi. The physical Win2k3 server outperformed the Win2k3 VM by a small amount. The Ubuntu VM smoked both Win2k3 servers. The Win2008 VM acted like Win2k3 for about 5-10 seconds then started inching up performance until it was about the same as the Ubuntu Server.

IOmeter Setup and Results: 32k 50% read 0% random 1 Worker on a 1.5GB test file.

Win2k3x64 (Physical): Avg I/O response (ms) - 1.0789 Total I/O per second - 926.28

Win2k3x64 (VM) : Avg I/O response (ms) - 1.1786 Total I/O per second - 847.81

Ubuntu (VM) : Avg I/O response (ms) - .7849 Total I/O per second - 1273.00

Win2008R2(VM)Intial : Avg I/O response (ms) - 1.0959 Total I/O per second - 910.00

Win2008R2(VM)30 Sec.: Avg I/O response (ms) - .8810 Total I/O per second - 1133.66

You may be saying, switch to Linux! Half of our applications rely on MS-SQL and Win2k3 so it is not an option. Switching to 2008 server may be an option, but not before I figure out why I'm getting these results.

PART 2

Alignment was definately the problem. It turns out the 2008 and Ubuntu are automatically aligned. Now for problem 2. For one IOmeter worker process, it is fine. However, the performance on the VM gets worse compared to the physical for every worker I add. I've added up to 8 (1 per CPU) and again I'm down by 50 percent compared to the physical server.
I have tried 4k,16k,and 64k disk allocations on Win2k3. The SCSI drivers are LSI Logic PCI-X Ultra 320 SCSI host adapter (version 5.2.3790.3959). The CPU usage is about the same as the physical server so its not a cpu resource issue.

You guys solved the first problem, hopefully you have some advice for this too. Thank you

truck0321
  • 33
  • 6
  • 1
    latest version of vmtools installed on all? what HBA and what driver version on the physical box? what's the SAN? oh and what version of ESXi? – Chopper3 Nov 10 '10 at 18:39
  • 1
    What kind of filesystem on the Ubuntu? Mount flags? What block size on array/block size of nts? – pauska Nov 10 '10 at 19:24
  • 1
    Oh, and what kind of virtual scsi driver in use on the VM's? – pauska Nov 10 '10 at 19:35

2 Answers2

3

Looks to me that you haven't aligned the test volume on Win2K3. By default Win2K3 doesn't align partitions so the MBR causes an offset that results in a penalty on writes that cross stripe boundaries. Win2K8 automatically aligns on 1Meg which generally matches most RAID stripe boundaries. Recent Ubuntu builds also automatically align partitions starting at a 1Meg offset.

With your 32K IO size you are likely to be hitting a lot of RAID stripe boundaries. The 50% penalty that you're seeing is higher than anything I've seen with this but the exact penalty depends on your RAID controller's stripe size.

I've no good explanation for the W2K8 ramp up behaviour.

Helvick
  • 20,019
  • 4
  • 38
  • 55
  • I'll close this out and open a new ticket for part 2. Thanks! – truck0321 Nov 12 '10 at 18:05
  • 1
    Update: I solved the second problem. It turns out using 5 year old fiber cables is bad. We recycled some fiber cables we had from a previous setup but who knew that they would only connect at 2Gbps? Trying to save a buck ended up costing us a lot of time/money. – truck0321 Nov 16 '10 at 16:38
  • It's a healthy reminder that when you are looking at SAN performance issues it's always a good idea to check the SAN fabric itself, I hadn't even thought to list that as a possibility. – Helvick Nov 16 '10 at 18:44
1

I agree with Helvick, 2003 is likely having alignment issues. I suspect that the 2008 box is using more cache as time goes on. I would set up your IOP test similar to VMware did as possible:

100,000 I/O Operations Per Second, One ESX Host

Jim B
  • 24,081
  • 4
  • 36
  • 60