0

looking at the logs I see several errors. The smart status is PASSED, but even if It is verbose It isn't clear what is going on. Should the disk be replaced?

Attached dmesg and smart logs

# dmesg
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:61:5d:64/00:00:5f:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:63:5d:64/40:00:5f:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        5f 64 5d 63 
sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
sd 0:0:0:0: [sda] CDB: Read(10): 28 00 5f 64 5d 61 00 00 08 00
ata1: EH complete 



# smartctl -x /dev/sda
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12 family
Device Model:     ST31000528AS
Serial Number:    9VP8WA3X
Firmware Version: CC38
User Capacity:    1,000,204,886,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun Feb 10 16:13:22 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

....

Error 36 [15] occurred at disk power-on lifetime: 16731 hours (697 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 5f 5d 00 64 63 00 00  Error: UNC at LBA = 0x5f5d006463 = 409582199907



SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            8  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
grigio
  • 31
  • 1
  • 1
  • 3
  • Your post is far too long ( there is a 30k limit) - do you really think that someone will bother to read that much log file output ? It seems likely that the most relevant information which will be the newest will be missing from the bottom. You need to trim your logs down to a reasonable size. – user9517 Feb 10 '13 at 15:51

1 Answers1

5

Congratulations, you've just had your first unrecoverable read error.

sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed

This means the drive was unable to read from a sector of the disk, and in this case, was also unable to reallocate a good sector to take the place of the bad sector.

A URE can cause the drive to appear failed to a RAID controller and putting your RAID array in a degraded state. After the next read error on any remaining disk, you lose all your data. So you should replace the disk immediately. (And woe to you if this isn't in a RAID...)

Despite the "pass" from SMART, you should be able to get a warranty replacement when your drive is showing any number of unrecoverable read errors.

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972