0

I have a megaraid 9260-8i with a 4 disc raid 5. For 6 months or more the array has been performing beautifully, but recently the hardware alarm went off and the status said one of the drives was missing.

I took the drive out and tested it, but didn't find anything. I put the drive back and rebuilt the array. Worked fine for about another day or so and a different drive had the same issue. Wash, rinse, repeat and wouldn't you know it. The issue just happened again.

I don't think it's the drive, because it doesn't happen on only one drive. There hasn't been any cable movement, so I'm hesitant to assume a bad cable. The hotswap enclosure was pretty cheap, so maybe it could be that. Really, I think it is the > 5 year old power supply, but I don't know how to do any verification. Does anyone here know what to do?

Eric Fossum
  • 225
  • 3
  • 11
  • use software raid. hw raid is so XX. century. – Ipor Sircer Sep 09 '16 at 14:09
  • Is it always the same slot? – mzhaase Sep 09 '16 at 14:52
  • I've had issues with the multilane cables and with the backplane on some servers. The symptoms you describe could be either. – Aaron Sep 09 '16 at 15:13
  • @mzhaase - I don't move the drives, so different slots. – Eric Fossum Sep 09 '16 at 15:44
  • @Aaron - Would you recommend swapping cables then? I don't have backups, so I'll try running them without the hotswap rack first. – Eric Fossum Sep 09 '16 at 15:44
  • It may be bad backplane, cables or RAID controller. When temperature will have changed the electrical contact may lost if there is hair crack. – Mikhail Khirgiy Sep 09 '16 at 15:47
  • Is there any way to test it without replacing it? I don't want to replace anything that still works (although I did just order new power and data cables). – Eric Fossum Sep 09 '16 at 15:55
  • If you don't have support on this hardware, then all I can suggest is replacing one thing at a time after having reseated everything. There are too many variables for anyone here to guess with certainty. After a few years, folks sometimes write-down the hardware and tech-refresh it to avoid going down this rabbit hole. Then you can repurpose the hardware to something that is lower priority and can withstand outages. – Aaron Sep 12 '16 at 17:44

1 Answers1

0

Well, I removed them from the hot swap cage and changed the power cables. That seemed to fix it for a few days and I thought it was all better, but last night it happened again.

This time I silenced the alarm and this morning something in there is now ticking...

So after all that, even though the drives were dropping out (not status fail), it seems there is a bad drive in there somewhere. I've since ordered two drives (one as hot spare) and I'll be replacing what I think is the ticking one. Hopefully I'll come back and post any future advice.

Eric Fossum
  • 225
  • 3
  • 11