4

We have a client who is complaining about performance of an application which utilizes an MS SQL database. They do not believe the performance issues are the fault of the application itself.

The Smart Array E200i RAID controller has 128MB cache and we have the cache set to 75% read/25% write. The disk array set to enable write caching.

Recently we ran a disk performance test using SQLIO based on this guide. We used a 10 GB file for the test found that the average sequential read rate was ~60 MB/sec (megabytes/sec) and the average random read rate was ~30 MB/sec. Are these numbers on par for what the server should be performing? Better than on par? Horrible? Amazing?

Additional information on the server set up/RAID controller config:
There are three, 146 GB SAS 10k RPM 3.0 GB/sec (model HP DG146BABCF) drives, configured in a RAID 5 array. These are the only physical disks available to the server so both logs and data, including operating system data and paging file are all on the same physical disk array (there are 2 logical drives with the OS data being separate). The array stripe size is set to 64k. Total usable space is 273 GB.

The HP Advanced Data Guard is turned off. Rebuild and expand priority are set to medium. Surface scan delay is 15 sec. The controller has a cache board and a battery pack.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
Nate Pinchot
  • 257
  • 1
  • 3
  • 10
  • You should probably post the configuration (raid levels, where it stores datafiles and where it stores logfiles). – pauska Jun 11 '10 at 13:45
  • Good call, sorry for not including that in the first place. – Nate Pinchot Jun 11 '10 at 13:54
  • 1
    The disk performance sounds on-par for an E200 and 3 10k SAS drives. It sounds like the application needs a "faster" disk setup to improve performance (or the app is just poorly written, very hard to say without knowing how it works). Switching to a P600 or P800 would make a small improvement, more disks in a RAID 10 would make more improvement. – Chris S Jun 11 '10 at 14:04
  • The app is terribly written - it runs the same query that takes 50+ seconds to execute, 6 times in a row, no more than a second after it completes. It has clustered indexes on nvarchar(40) which are 95% null values. It constantly searches on non-indexed columns. We are not allowed to adjust these because the client's friend wrote the app and that person walks on water as far as they are concerned. – Nate Pinchot Jun 11 '10 at 14:11

6 Answers6

4

Too many imponderables. For example, how are the disks set up? If the logs and data share the same disks the random I/O from the data areas will disrupt the log traffic, which is mostly sequential I/O and is disproportionately affected by a busy random access workload on the same disks.

Without some more insight into your configuration I can't really say what might be causing the problem.

For example, 60MB/sec off a RAID is about right for a 4-disk RAID-5 or RAID-10 with 64k stripes and 15k drives. Each drive will read one 64k stripe per revolution of the disk (about 250/sec for a 15k drive) which gives you 15MB/sec per drive.

The average seek time for a 15k disk is around 3ms across the whole disk. On a mostly contiguous 10GB file on a RAID volume with (say) 146GB or 300GB disks and a bit of help from the cache I could see 30MB/sec being a reasonable figure for a disk array configured as described above. It would indicate averaging a data read about every two revolutions of the disks.

That's a thought off the top of my head for a configuration one might reasonably expect to see on an ML350. However, I have no idea if that matches your actual configuration, so I can't really comment on whether the observations are relevant in your case.

  • My apologies for not including more detail on the hardware configuration. There are three, 146 GB SAS 10k RPM 3.0 GB/sec (model HP DG146BABCF) drives, configured in a RAID 5 array. These are the only physical disks available to the server so both logs and data, including operating system data and paging file are all on the same physical disk array (there are 2 logical drives with the OS data being separate). The array stripe size is set to 64k. Total usable space is 273 GB. – Nate Pinchot Jun 11 '10 at 13:50
  • 2
    Logs on RAID5 is not a recommended practice, logs and data on the same arry is not recommended and finally; OS and logs and data on the same array is not recommended. – pauska Jun 11 '10 at 13:58
  • Indeed :) I guess I should have mentioned that at this point we are kind of at arms with the client arguing that the software is not the issue and the servers are not performing as well as they should (so in turn they are not currently willing to spend money to do hardware upgrades) - I am just trying to prove to the client that the server is performing optimally with the hardware available, because they believe that it is not. – Nate Pinchot Jun 11 '10 at 14:07
  • 1
    Selected this as the answer because it more accurately addresses the original question, but @steveburkett's answer below is extremely relevant to the situation as well. – Nate Pinchot Jun 11 '10 at 14:20
4

The E200i has notoriously poor performance, as documented by Lukas here and Ryan here. Check you've got the optional battery kit (BBWC) attached and the HP Array Configuration Utility is showing the battery status is ok. (In HP Array Configuration Utility click on the Controller in the Configuration View in the center and choose More Information on the Common Tasks menu on the right). Having this Battery in place gives the disk controller a good boost in performance.

But your best bet is going to be to swap out the E200 with a 'proper' HP SmartArray controller, the P400 or P600's with the 512MB BBWC should give you a good speed boost.

SteveBurkett
  • 990
  • 4
  • 6
  • The battery pack is installed and the status is okay. I'll check out the P400 and the P600, thanks. – Nate Pinchot Jun 11 '10 at 13:57
  • 1
    +1 We have strugged with this and this comes up in HP forums. Never found a great fix. The upgraded cards seem to be the answer. – Dave M Jun 11 '10 at 14:07
  • Do you think there would be considerable performance increase if we upgraded only the controller to the P400 or P600? At less than $1000 I might be able to convince them to get just the controller. Or would it be a negligible performance increase by only upgrading the controller? – Nate Pinchot Jun 11 '10 at 14:14
  • Never mind that last comment, I see Chris S above said swapping out the controller would probably only make a small improvement. – Nate Pinchot Jun 11 '10 at 14:16
2

Below is a benchmark we did with similar hardware, and a few differences noted. I would guess some of the performance hit is due to the partitions are misaligned (Windows 2003 misaligns partitions by default). Run the following command, and if the starting offset is 32256, it is misaligned.

wmic partition get index, blocksize, name, startingoffset

To properly align the partitions, you need to use the DISKPART utility.

Test Hardware:
HP DL380 G5
2 sockets, 4 total cores
16GB of RAM
HP P400 controller
512MB cache
25% Read / 75% Write
10K RPM HDDs
Windows Server 2003, 32-bit
Five-minute test cycle

Results (MBytes/second):
(RS: Read Sequential RR: Random Read WS: Write Sequential WR: Write Random)

Raid5 3 HD:
RS 180
RR 180
WS 120
WR 130

Raid5 4 HD:
RS: 240
RR: 260
WS: 175
WR: 180

Raid5 5 HD:
RS: 310
RR: 320
WS: 210
WR: 225

Greg Askew
  • 35,880
  • 5
  • 54
  • 82
  • Thanks for the information. Ran the wmic command above - StartingOffset is 16384. What is the proper StartingOffset? Thanks for the performance benchmark as well, I may consider asking the client to try upgrading only the controller and keep the current drives - since that is a sub $1000 upgrade they may go for it. – Nate Pinchot Jun 11 '10 at 17:26
  • Here is an excellent article on calculating partition offset. Note that in Windows Vista/7/2008, the starting offset is 1 MByte (1048576 bytes) (properly aligned). http://msdn.microsoft.com/en-us/library/dd758814.aspx – Greg Askew Jun 11 '10 at 17:54
  • You may also want to run fsutil fsinfo ntfsinfo x: , where x: is your partition(s). This will confirm your NTFS allocation unit size. If it is 64k, which is a best practice for SQL, I would want something larger than 16384 for the starting offset. – Greg Askew Jun 11 '10 at 18:04
2

You should not be having a "pseudo-degraded" Raid5 as a database backing store. AFAIK is the slowest possible disk configuration. Add in a low end controller and things will not get better.

Hitachi AMS owners can ignore this posting (4disk raid5 = above 300MB/s). For the rest it probably applies.

darkfader
  • 73
  • 1
  • 1
1

Does the Smart Array E200i have a backup battery write cache installed? I have seen some dire performance out of Smart Array controlers without a BBWC.

As for the data rate readings you took, I can't really comment I have seen 60 MB/sec out of a high end desktop system. However data rate readings can lie! Post up the parameters for the tool you used for a better comparison.

Richard Slater
  • 3,228
  • 2
  • 30
  • 42
  • The controller does have the battery pack installed. The parameters are in the link for the guide, I copied his batch file verbatim (changing the drive letter of course). – Nate Pinchot Jun 11 '10 at 13:56
1

The short answer = ENABLE the write cache BUT set the WRITE CACHE to 0% via the HP CLI. [ctrl slot=0 modify cacheratio=100/0 | ctrl slot=0 modify dwc=enable] Turn it on, but don't use it, go figure!

I just set up an Open-e DSS V6 server on an HP ML350 G5 with a E200i RAID controller, 128MB BBWC, and 6 x 1.5 TB 7200 RPM drives in a RAID 10 array. I was getting very bad performance, less than 100 iostats on an iSCSI file I/O volume high initialize rate. I have similar configurations using 3Ware 9550 cards producing well over 1000 iostats so I was trying to figure out what the ??? was going wrong.

A bit of research directed me here and to an Experts-Exchange article (http://www.experts-exchange.com/Storage/Hard_Drives/Q_24947953.html). It would appear the RAID 5 processor and write cache are troublesome components to this controller. I wasn't using RAID 5 so I experimented with the read/write cache through the HP controller CLI that Open-e conveniently puts in their software.

The CLI was running VERY SLOW, commands took a minute to respond. Finally the above command made the array perform closer to expectation. I am now seeing almost 1000 iostats and the CLI is responding normally. Yes, you have to enable the write cache and set it to 0%. No other combination seems to work. Even using the disable write cache option failed to provide acceptable performance.