1

I am having trouble replacing a disk on an existing zpool on a system running Solaris 10 on an x86 processor. The zpool was originally created with two mirrored slices. One of the drives failed, so I swapped it physically with a new drive. I ran prvtoc and fmthard to copy the disk label from the working drive onto the new drive:

prtvtoc /dev/rdsk/c1t0d0s2 >/tmp/c1t0d0s2.out
fmthard -s /tmp/c1t0d0s2.out >/dev/rdsk/c1t1d0s2

Then I tried to bring the new drive online and got a warning about the device still being faulted:

$ zpool online pool c1t1d0s6 
warning: device 'c1t1d0s6' onlined, but remains in faulted state

The output of zpool status -v is:

NAME          STATE     READ WRITE CKSUM
pool          DEGRADED     0     0     0
mirror-0    DEGRADED     0     0     0
c1t0d0s6  ONLINE       0     0     0
c1t1d0s6  UNAVAIL      0     0     0  corrupted data

(c1t1d0 is the replaced drive.)

Then I brought c1t1d0 offline again and tried running the zpool replace command, but this did not work, either:

$ zpool replace pool c1t1d0s6
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c1t1d0s6 overlaps with /dev/dsk/c1t1d0s2

Does anyone know what's going on? Is it safe to use the '-f' flag?

Edit: After running zpool replace -f, I get:

pool: pool
state: DEGRADED
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
scrub: none requested
config:

    NAME                STATE     READ WRITE CKSUM
    pool                DEGRADED     0     0     0
      mirror-0          DEGRADED     0     0     0
        c1t0d0s6        ONLINE       0     0     0
        replacing-1     UNAVAIL      0     0     0  insufficient replicas
          c1t1d0s6/old  OFFLINE      0     0     0
          c1t1d0s6      UNAVAIL      0   342     0  experienced I/O failures

I see errors on the new drive in iostat -e output. I guess the new drive might be bad, too?

Edit 2: I don't know what's going on. I tried a different drive with the same procedure. After running the zpool replace -f, the zfs pool ran a scrub, but the status output is:

  pool: pool
 state: ONLINE
 status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
 action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scrub: scrub completed after 12h56m with 0 errors on Wed Aug 29 06:49:16 2012
config:

    NAME                STATE     READ WRITE CKSUM
    pool              ONLINE       0     0     0
      mirror-0          ONLINE       0     0     0
        c1t0d0s6        ONLINE       0     0     0
        replacing-1     ONLINE   5.54M 19.9M     0
          c1t1d0s6/old  UNAVAIL      0     0     0  corrupted data
          c1t1d0s6      UNAVAIL      0     0     0  corrupted data

After offlining c1t1d0s6, the zpool status output is:

  pool: pool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
scrub: scrub completed after 12h56m with 0 errors on Wed Aug 29 06:49:16 2012
config:

    NAME                STATE     READ WRITE CKSUM
    pool                ONLINE       0     0     0
      mirror-0          ONLINE       0     0     0
        c1t0d0s6        ONLINE       0     0     0
        replacing-1     ONLINE   5.54M 19.9M     0
          c1t1d0s6/old  UNAVAIL      0     0     0  corrupted data
          c1t1d0s6      UNAVAIL      0     0     0  corrupted data

I don't get it. Shouldn't the system be able to replace c1t1d0s6 using the mirror on c1t0d0s6?

slec
  • 143
  • 1
  • 6

2 Answers2

1

Did you clear the alerts in fmadm? And the zpool clear... It's safe to run the zpool replace with the -f switch, but I think your statement is wrong, unless you already removed the bad disk.

http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html

ewwhite
  • 197,159
  • 92
  • 443
  • 809
1

Check the cables or the drive sled and slot. A noisy SATA connection will yield errors and fmadm uses that info to determine when a device is faulted. I've had drives I thought were bad, but it was just ZFS noticing it wasn't getting valid data reliably. I noticed a pinched SATA cable, replaced it and ran zpool clear and zpool scrub, no more errors.

notpeter
  • 3,515
  • 2
  • 26
  • 44