2

Server: Ubuntu Lucid
RAID controller: Adaptec 3805
8 disks in RAID6 on HP Proliant DL180 G5 Hardware

My kern.log tells me that I have an error on sdb, as shown below:

[2740390.344436] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2740390.344439] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]
[2740390.344442] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure
[2740390.344447] sd 4:0:1:0: [sdb] CDB: Read(10): 28 00 33 dd dc 00 00 00 08 00
[2740390.344454] end_request: I/O error, dev sdb, sector 870177792
[2774094.573841] sd 4:0:1:0: [sdb] Unhandled sense code
[2774094.573847] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2774094.573851] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]
[2774094.573856] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure
[2774094.573862] sd 4:0:1:0: [sdb] CDB: Read(16): 88 00 00 00 00 01 33 dd ef e8 00 00 01 00 00 00
[2774094.573873] end_request: I/O error, dev sdb, sector 5165150184
[2774094.615437] sd 4:0:1:0: [sdb] Unhandled sense code

arcconf command is telling me all disk states are online & Failed stripes : Yes

How can I identify which disk is bad out of the 8 disk raid6 array?

Amended: May 2nd 2012 - added the below:

/usr/local/sbin/arcconf getconfig 1 AL

Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status                        : Optimal
Channel description                      : SAS/SATA
Controller Model                         : Adaptec 3805
Controller Serial Number                 : 0C18115C3BB
Temperature                              : 0 C/ 32 F (Normal)
Installed memory                         : 128 MB
Copyback                                 : Disabled
Background consistency check             : Disabled
Automatic Failover                       : Enabled
Global task priority                     : High
Stayawake period                         : Disabled
Spinup limit internal drives             : 0
Spinup limit external drives             : 0
Defunct disk drive count                 : 0
Logical devices/Failed/Degraded          : 2/0/0
NCQ status                               : Enabled
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS                                     : 5.2-0 (17342)
Firmware                                 : 5.2-0 (17342)
Driver                                   : 1.1-5 (2461)
Boot Flash                               : 5.2-0 (17342)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status                                   : Optimal
Over temperature                         : No
Capacity remaining                       : 99 percent
Time remaining (at current draw)         : 3 days, 1 hours, 11 minutes

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name                      : boot
RAID level                               : 1
Status of logical device                 : Optimal
Size                                     : 476150 MB
Read-cache mode                          : Enabled
Write-cache mode                         : Enabled (write-back)
Write-cache setting                      : Enabled (write-back)
Partitioned                              : Yes
Protected by Hot-Spare                   : No
Bootable                                 : Yes
Failed stripes                           : No
Power settings                           : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0                                : Present (0,7)             Z2AD1A3H
Segment 1                                : Present (0,3)             Z2AD1834

Logical device number 1
Logical device name                      : data
RAID level                               : 6 Reed-Solomon
Status of logical device                 : Optimal
Size                                     : 2858990 MB
Stripe-unit size                         : 128 KB
Read-cache mode                          : Enabled
Write-cache mode                         : Enabled (write-back)
Write-cache setting                      : Enabled (write-back)
Partitioned                              : Yes
Protected by Hot-Spare                   : No
Bootable                                 : No
Failed stripes                           : Yes
Power settings                           : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0                                : Present (0,0)             6VPEFSZ0
Segment 1                                : Present (0,1)             5VPA5934
Segment 2                                : Present (0,2)             5VPA7132
Segment 3                                : Present (0,4)             5VPAJ8EJ
Segment 4                                : Present (0,5)             5VPA6NAZ
Segment 5                                : Present (0,6)             5VPAJM8Q


----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
  Device #0
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,0(0:0)
     Reported Location                  : Connector 0, Device 0
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 6VPEFSZ0
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #1
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,1(1:0)
     Reported Location                  : Connector 0, Device 1
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPA5934
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #2
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,2(2:0)
     Reported Location                  : Connector 0, Device 2
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPA7132
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #3
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,3(3:0)
     Reported Location                  : Connector 0, Device 3
     Vendor                             : ST500DM0
     Model                              : 02-1BD142
     Firmware                           : KC44
     Serial number                      : Z2AD1834
     Size                               : 476940 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #4
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,4(4:0)
     Reported Location                  : Connector 1, Device 0
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPAJ8EJ
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #5
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,5(5:0)
     Reported Location                  : Connector 1, Device 1
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPA6NAZ
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #6
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,6(6:0)
     Reported Location                  : Connector 1, Device 2
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPAJM8Q
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #7
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,7(7:0)
     Reported Location                  : Connector 1, Device 3
     Vendor                             : ST500DM0
     Model                              : 02-1BD142
     Firmware                           : KC44
     Serial number                      : Z2AD1A3H
     Size                               : 476940 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled


Command completed successfully.

Update with added partition info below:

**fdisk -l**

Disk /dev/sda: 499.3 GB, 499289948160 bytes
255 heads, 63 sectors/track, 60701 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0002ab26

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1       59952   481562624   83  Linux
/dev/sda2           59953       60702     6022145    5  Extended
/dev/sda5           59953       60702     6022144   82  Linux swap / Solaris

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sdb: 2997.9 GB, 2997878784000 bytes
255 heads, 63 sectors/track, 364471 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      267350  2147483647+  ee  GPT



**df -h**
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             453G  112G  319G  26% /
none                 1000M  224K 1000M   1% /dev
none                 1005M     0 1005M   0% /dev/shm
none                 1005M  664K 1004M   1% /var/run
none                 1005M  4.0K 1005M   1% /var/lock
none                 1005M     0 1005M   0% /lib/init/rw
/dev/sdb1             2.7T  1.5T  1.1T  58% /media/raid1
/dev/sdb1             2.7T  1.5T  1.1T  58% /media/usbhd-sdb1
/dev/sda1             453G  112G  319G  26% /media/usbhd-sda1


**fstab**
# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    nodev,noexec,nosuid 0       0
# / was on /dev/sda1 during installation
UUID=12dd3c31-6dba-4c26-ba81-88a76510bffd /               ext4    errors=remount-ro 0               1
# swap was on /dev/sda5 during installation
UUID=81618042-ec4e-45e9-947f-9198d29651d3 none            swap    sw              0       0
UUID=a7832728-5bf9-45c4-8a29-2824b4f2c250 /media/raid1    ext4    errors=remount-ro,noatime 0       1
Pierre.Vriens
  • 1,159
  • 34
  • 15
  • 19
sixnumber
  • 55
  • 1
  • 2
  • 8
  • 2
    It's a hardware RAID controller, right? And the entire array is presented to the OS as /dev/sdb? (The smartctl output you pasted in a comment would seem to support that, what with the drive reporting that it's Adaptec branded.) The errors that the OS is reporting indicate hardware errors. Your RAID controller is supposed to hide such errors from the OS. If you're seeing such errors, the RAID controller has failed in some way. You may already have corrupted data. – wfaulk May 02 '12 at 08:53
  • Yeah, it's a hardware RAID controller - Adaptec 3805. – sixnumber May 02 '12 at 09:07
  • @sixnumber: have you found a solution for this problem? – zeldi Nov 25 '13 at 11:03

5 Answers5

3

Unless I'm mistaken, these errors are telling you that you have errors that haven't been corrected by the RAID controller. The RAID controller should be hiding errors like that from you. I don't think you have a simple disk failure. I think you have something more serious going on.

wfaulk
  • 6,878
  • 7
  • 46
  • 75
  • Sorry for my rubbish editing but I've added more detail to my original question which I've included the full output of: /usr/local/sbin/arcconf getconfig 1 AL – sixnumber May 02 '12 at 08:46
3

Assuming that the volume "boot" in your raid-setup is recognized as sda and "data" as sdb your system tells you the following:

[2740390.344436] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

The scsi subsystem has issued a command without error to the lowlevel driver (for your adaptec card) and the card responded with an error (DRIVE_SENSE is set).

[2740390.344439] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]

This ist the type of error (see i.e. scsi driver information).

[2740390.344442] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure

This is additional info the driver reports whereas this info as far as I know means "no specific information" / "no idea what went wrong".

[2740390.344454] end_request: I/O error, dev sdb, sector 870177792

The error has reached the block layer.

As stated in another answer: this is not a single disk failure, this is a failure of the whole raid. You should check your data carefully and consider a replacement of the raid subsystem or at least the controller.

And you should always(!) enable "Background consistency check" / "Passive Scan" / "Verify" on your raid controllers to find silent corruption which otherwise may kill your raid in the case of a rebuild.

Did you see any filesystem errors? Is /dev/sdb partitioned / mounted?

tim
  • 1,217
  • 3
  • 11
  • 23
  • Interesting - thanks for the further info. I'm not seeing any filesystem errors in the messages or kern.log - only the messages in my original post. I'll add the disk partition details above to my original post. It's strange because data is being rsync'd to this server daily and the data is consistent (for now!) with the location it was rsync'd from. – sixnumber May 02 '12 at 10:37
  • Do you really use rsync in checksum mode? Otherwise you cannot make data integrity assumptions based on it's results. – tim May 02 '12 at 11:25
  • Ah, no, I just manually checksum'd a few files - not ideal and no guarantee I know but just to ease my conscience a little. – sixnumber May 02 '12 at 11:40
  • I still don't get what's going on. The output of the 'arcconf' command is showing Controller Status : Optimal. All disks are present and online. It's just the message in the logs about 'I/O error, dev sdb, sector 5165150184' & 'I/O error, dev sdb, sector 870177792' that is throwing me. The server seems to be running fine as does bacula and the backups are running ok. I'm reluctant to reboot the machine for now until I have a plan in place. – sixnumber May 03 '12 at 09:45
  • Story of my life "no idea what went wrong". I guess I'll look into replacing the controller and rebuilding the server. Thanks all for your time and effort. – sixnumber May 03 '12 at 10:01
1

This will sound funny, but did you look on the front of the server to see which drive had an error LED lit up? (assuming the drives have LEDs)

Also, you can install the storage manager software: http://www.adaptec.com/en-us/downloads/storage_manager/sm/productid=sas-3805&dn=adaptec+raid+3805.html

TheCleaner
  • 32,627
  • 26
  • 132
  • 191
  • :-) Yes, I did look on the front of the server but there aren't any LEDs lit, bizarrely. If I recall correctly the last time the server was restarted they briefly flashed on startup. – sixnumber May 01 '12 at 15:09
  • The flip side of that is if you do have a failed drive, it's LED will *not* light up under heavy writes. But be careful because your parity disks might not look like they're lighting up depending on the exact workload, and during reads they almost definately won't light up. – Mark Henderson May 01 '12 at 21:55
  • @Mark, I'm talking about an error LED, not activity LED. – TheCleaner May 02 '12 at 14:41
0

It's possible you could get the information via smartctl (CLI) or Adaptec's CLI (as mentioned above)

thinice
  • 4,716
  • 21
  • 38
  • I'm not sure SMART is enabled (sorry if I'm being a bit dumb but I struggle with disks and storage). This is shown in the arcconf command for all 8 disks: S.M.A.R.T. : No S.M.A.R.T. warnings : 0 – sixnumber May 01 '12 at 15:15
0

If you can reboot the server, do it from SmartStart DVD. If I remember you can access ACU from there to have a graphical view of RAID volumes.

Danilo Brambilla
  • 1,031
  • 2
  • 15
  • 33