3

We have a RHEL5 install on a machine with 5 Seagate 1 TB SAS disks. One has the OS and swap on it. The other four are in a hardware Raid 5 (mounted as /home) managed by a DELL Perc 6/i controller. When the system boots up it spends a long time on udev, which eventually times out. Nonetheless the write speed on the raid is in 90-100MB/s range. We tested write speed by copying a large file (~3 GB) over and over. On the seventh copy the write speed suddenly fell to 1 MB/s. We have tested this multiple times and the problem is reproducible. There are no messages in any of the logs in /var/log/. free -m and vmstat do not show any swap activity. A reply to a similar post asked for contents of /proc/mdstat. Well ours are:

# cat /proc/mdstat
Personalities : 
unused devices: <none>

I am not sure how to trace the problem beyond this. This system used to work fine for a over a year and half. This problem started after we tried to upgrade from 1 TB disks to 2 TB disks to increase disk space. The new disks were installed and array built from scratch. We have since reverted to the 1 TB disks since we know that hardware config worked for sure. Any suggestions or troubleshooting tips are welcome. Thank you for your time and patience.

EDIT: Problem solved. Turns out that the machine needed many firmware updates. Around the time the 2 TB disks were first tried the RAID controller was updated, and this was causing issues. Now after installing 3 BIOS and 1 newer Controller firmware updates the machine works like a charm. Write speeds are in the 180 MB/s range. Thanks to the people who tried to help.

1 Answers1

2

If this happened suddenly, and is still happening with the 1TB disks you had previously (your question is unclear as to whether the fault happens just with the 2TB disks or with both 1TB and 2TB) then I'm leaning towards a hardware level fault of some kind. Perhaps the battery backing the write-cache on the controller is futzing out, causing it to fail after a certain period and forcing the controller to go into write-through mode. Or maybe it's having a thermal fault, and is throttling down to avoid letting the magic smoke get out.

If it is only happening with the 2TB disks, but not with the 1TB disks, it could be that the card just plain can't handle 2TB drives. Some can't. Once writes get to a certain level the internal data-structures overflow and efficiency is lost. That might be fixable with a (future) firmware update, or it could be intrinsic to the card itself. Without knowing the exact card in use I can't find it myself, but looking up the supported drives for the card would be the next step if this is the case.

sysadmin1138
  • 133,124
  • 18
  • 176
  • 300
  • Thanks for the reply. The problem exists with both 1 TB and 2 TB disks. I had mentioned that the controller is a Dell Perc 6/i. We've updated the firmware on the controller to be the latest dell offers. I am leaning towards a hardware failure too, but am at a loss to figure out which part is causing the problem. Is there any way I can check for a thermal fault? Again thanks for the reply. – Kranthi Varala Jul 06 '10 at 23:42
  • +1 for " avoid letting the magic smoke get out" LOL! – Billy ONeal Jul 08 '10 at 00:00