1

I have a server I maintain for a client, It's an Intel Nehalem based computer, so not new, but in working order. It has a 3ware 9650SE-24M8 card with 20 drives attached and 4 empty slots, configured into 5 raid5 arrays. Three of these arrays are made up of 2tb drives, the other two 3tb. In the past two weeks we have lost 3 3tb drives from the same array, two on the same day. We make nightly backups, so there is not data loss, but the downtime is expensive, as are the replacement disks. by lost I mean that the card is reporting and ECC error status. the 2tb drives are approach 25000 operating hours, the 3tbs are approaching 10000 operating hours.

Ambient temperatures are roughly 25C, while the drives are (according to smart) idling at about 28-30C. Operating system is fedora Linux 13 amd64 (I've been trying to get upgraded for six months, but cant get operation stable enough to feel ready for it).

I'm at a loss for what to do at this point, up to this point there have been two drives that died, both 1tb and quite old, and several months apart. help or suggestions?

Dylan
  • 11
  • 1
  • 1
    I've seen clusters of drive failures in our Netapp filer (4 in one weekend is our record) - probably a combination of them all being from the same manufacturing lot, and because they get hit a lot harder during a rebuild after a drive fails, which can cause another failure. But then we've got about 200 drives, about 10 times more that you have. Are you using good "enterprise" grade drives? I've seen cheap desktop drives fail in less than a year in 24x7 SAN usage. – Johnny Sep 05 '13 at 00:08
  • Are these actually drives designed for use in a RAID, or is it consumer drvies that is crapping out on you? If it isn't enterprise hardware, then this is arguably 'normal'. If the drives were ordered all at the same time, having them fail at about the same time, when they were under the same work load is not surprising. – Zoredache Sep 05 '13 at 00:10
  • What does "ECC error status" mean? What do the SMART stats say? – psusi Sep 05 '13 at 03:30
  • Can you please update post with drive Manufacturer, Part Number and firmware version? Also, are you running latest firmware for adapter? Which backplane do you use? – GioMac Sep 05 '13 at 10:47

0 Answers0