I have dedicated server with 4 HDD in hardware RAID 10 configuration and It worked fine until yesterday, when It started to crash randomly on couple minutes. I’ve contacted my data center and they’ve run a system diagnostics and they found that one of my HDD in the RAID 10 array was defective, they replaced the drive and it started rebuilding itself automatically. Then they’ve booted the system in normal mode and it was working normally for 15-minutes when it started to crash again. I made couple of diagnostics on my own and when I checked state of the physical drives with:
arcconf GETCONFIG 1 PD
I’ve noticed that the HDD 0,0 have S.M.A.R.T errors, I reported that to my DC and they confirmed this and requested to swap that device with new one, but they suggested me to make backup of my data (~2TB) because it’s very likely to lose my data. I’ve made backup of my data and then they replaced the second HDD. After booting they needed to make force start of RAID controller and the system booted in recovery mode. I think that they swapped the wrong drive first time because it’s highly unlikely two drives to fail at the same time in different mirror sets but that is another story to tell… My problem is that the second replaced HDD isn’t rebuilding it self. I’ve tried to clear the metadata for that drive with:
arcconf TASK START 1 DEVICE 0 0 CLEAR
and than set the state of the drive as hot spare with
arcconf SETSTATE 1 DEVICE 0 0 HSP LOGICALDRIVE 0
so it to begin rebuild process automatically but without success.
My RAID 10 array data is 4 HDD drives HDD 0,0 and HDD 0,1 are in mirror set and HDD 0,2 and HDD 0,3 in another.
The output from logical device state is: arcconf getconfig 1 ld
https://dl.dropbox.com/u/10839791/ld.txt
And the output from physical drive state is: arcconf GETCONFIG 1 PD
https://dl.dropbox.com/u/10839791/pd.txt
Controller status:
https://dl.dropbox.com/u/10839791/controller.txt
My questions is is there any way to make that drive rebuild it’self without loosing any data.
Thanks.