0

We have an ATTO R380 SAS RAID controller in a Windows Server 2003 HP DL-160 server with 8 700GB Seagate 7200.12 drives.

It's configured as a RAID 5 array. ATTO Config says all drives are healthy and we see consistent drive write/read rates accross the drives.

The problem is, when transferring files from the array to local C: drive or even over our 1GB LAN, performance is very inconsistent and quite poor sometimes.

For instance, some files will copy at over 100MB/sec while some will copy as low as 10MB/sec, fluctuating between 10MB-35MB/sec all over the place.

I know RAID 5 isn't good for write peformance, but what could be causing this weird inconsistent read performance?

Any ideas?

Thanks!

  • Take a look at the perfmon disk counters - especially the queue length and the disk idle time correlated to the throughput numbers to make sure that it is indeed a problem with your disk array and the transfer is not bottlenecked elsewhere (on the destination disk for example). – the-wabbit Dec 10 '11 at 23:29
  • Hi thanks for the answer. We didn't measure disk idle time (although I will look at that now). We did do transfers from array to local C: and also from array directly over the network to a client and the same disk transfer rates were seen: (low bandwidth and very spiky/inconsistent speeds). – megagram Dec 11 '11 at 06:58
  • So it looks like during phases where the transfer speeds diminish, the disk idle time actually increases to 100%. The disk idle time has corresponding troughs to the peaks in the transfer rate. Is this a controller issue then? – megagram Dec 12 '11 at 05:36

1 Answers1

0

If the array performance turns out to be the bottleneck (see my comment above on how to use perfmon to check on that), there are a couple of things you could do:

  • check the SMART attribute values of your drives for any suspicious values (like remapped sectors) and the drive's SMART error logs for any errors - if there is no way to do this through the controller, pull the drives, hook them up to a non-RAID SATA controller and use smartctl -a <drive> to read the attributes and the logs. smartctl is available either on nearly every Linux-based rescue CD like sysrescuecd or as a Windows port at the smartmontools website.
  • if there is a firmware/driver update for the controller, flash/install it
  • since it is a controller with external ports for an external enclosure, cabling might be an issue. Unfortunately, checking this will become troublesome - you would need some kind of a SAS tap device which is both expensive and hard to operate if your controller does not allow for some kind of protocol-level counters or traces. So simply replacing all cables as a shot in the dark should be a better alternative.
  • last but not least: contact ATTO technical support for assistance. If there are diagnostic procedures specific to the controller, they should know about them. If there are known issues, they should know as well.
the-wabbit
  • 40,737
  • 13
  • 111
  • 174
  • Thank you very much for your thorough answer. We have already checked SMART and done the firmware flash/driver update thing. We haven't looked at cables yet but I suppose this might be something we could look at and we will. – megagram Dec 11 '11 at 17:42
  • Do you think it has anything to do with Windows/NTFS? Would it be at all likely if we reformatted the array with EXT4 or XFS under a newer linux build that we would see improved performance? – megagram Dec 11 '11 at 17:43
  • No, I do not think it is filesystem-specific. My first guess would be that your transfer rate bumps are not a malfunction, but simply the result of the current load pattern - copying a high number of files or concurrent I/O would be more seek-intensive and thus slower than a linear read/write. The second guess would be a bug or hardware problem of some sorts. It makes sense to try another operating system - if only to rule out a possible driver bug. – the-wabbit Dec 11 '11 at 22:30
  • This is great info and I apologize for not providing you with as much info to begin with. This is one single file copy that we are doing/testing with (not many small files or anything). There is no other load on the array or server. We have booted into Linux and used Linux and the same results are seen. I found this: http://support.microsoft.com/kb/929491 I want to believe that it is somehow the NTFS/partitioning that is giving us grief. All 8 drives in the array have consistent read speeds when copying and no SMART alerts or anything. – megagram Dec 12 '11 at 04:27