2

I have a software RAID1 and now every week Linux synchronize my raid volume.

I checked cat /proc/mdstat:

 *Personalities : [raid1] 
       md3 : active raid1 sda5[0] sdb5[1]
             1822445428 blocks super 1.0 [2/2] [UU]

       md1 : active raid1 sdb2[1] sda2[0]
             524276 blocks super 1.0 [2/2] [UU]

       md2 : active raid1 sda3[0] sdb3[1]
             1073741688 blocks super 1.0 [2/2] [UU]
             [============>........]  check = 61.9% (665688192/1073741688) finish=203.8min speed=33367K/sec

       md0 : active raid1 sda1[0] sdb1[1]
             33553336 blocks super 1.0 [2/2] [UU]

       unused devices: <none>*

It's seems to be normal. But I verified /log/messages and found:

41/40:80:20:48:c3/00:00:04:00:00/00 Emask 0x409 (media error) <F>
May 26 10:45:45 CentOS-62-64-minimal kernel: ata2.00: status: { DRDY ERR }
May 26 10:45:45 CentOS-62-64-minimal kernel: ata2.00: error: { UNC }
May 26 10:45:45 CentOS-62-64-minimal kernel: ata2.00: configured for UDMA/133
May 26 10:45:45 CentOS-62-64-minimal kernel: ata2: EH complete
May 26 10:45:48 CentOS-62-64-minimal kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
May 26 10:45:48 CentOS-62-64-minimal kernel: ata2.00: irq_stat 0x40000008
May 26 10:45:48 CentOS-62-64-minimal kernel: ata2.00: failed command: READ FPDMA QUEUED
May 26 10:45:48 CentOS-62-64-minimal kernel: ata2.00: cmd 60/80:00:00:48:c3/00:00:04:00:00/40 tag 0 ncq 65536 in
May 26 10:45:48 CentOS-62-64-minimal kernel:         res 41/40:80:20:48:c3/00:00:04:00:00/00 Emask 0x409 (media error) <F>
May 26 10:45:48 CentOS-62-64-minimal kernel: ata2.00: status: { DRDY ERR }
May 26 10:45:48 CentOS-62-64-minimal kernel: ata2.00: error: { UNC }
May 26 10:45:48 CentOS-62-64-minimal kernel: ata2.00: configured for UDMA/133
May 26 10:45:48 CentOS-62-64-minimal kernel: sd 1:0:0:0: [sdb] Unhandled sense code
May 26 10:45:48 CentOS-62-64-minimal kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 26 10:45:48 CentOS-62-64-minimal kernel: sd 1:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]

Any help with that? What it can be and what I need to do?

Sven
  • 98,649
  • 14
  • 180
  • 226
webgeek
  • 21
  • 1

2 Answers2

2

Usually these errors mean that the drive (/dev/sdb in your case) might be failing soon.

You can use smartctl to do an extended S.M.A.R.T. self test to see if any error comes up.

You can do an extended check by running

smartctl -t long /dev/sdb

You can view the results (and much more info as well) by running

smartctl -a /dev/sdb

Also look for Relocated_Sector_Ct and Offline_Uncorrectable values. They should be 0. If the Relocated_Sector_Ct is > 0 then it means that the drive has already relocated some hard-to-read sectors to the spare area of the drive (it's normal behavior - but indicates that soon the drive might fail). If the Offline_Uncorrectable is > 0 then it means that the drive is failing.

Please post the results of smartctl -a /dev/sdb.

If smartctl is not installed you can install it by running

yum install smartmontools -y

Also those errors you got could be due to a faulty power supply to the drive or a faulty SATA cable.

Cha0s
  • 2,462
  • 2
  • 16
  • 26
  • 1
    Thanks, my SMART log for sdb: http://textuploader.com/?p=6&id=xWjN7 – webgeek May 26 '13 at 17:04
  • It looks like your drive is going to fail really soon. Replace it as soon as possible to avoid any unexpected delays or even server crash. – Cha0s May 26 '13 at 17:09
  • This tutorial can help you replace the drive and add the new back to the RAID1 array. http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array – Cha0s May 26 '13 at 17:10
1

It seems your sdb drive is failing. Replace it before it's too late. You can try to use SMART tools to confirm that diagnostic, with a command such as:

sudo smartctl -q errorsonly -H -l selftest /dev/hda