1

We are facing an failure of one of our Solaris/ZFS file server. The h/w won't boot, we are not that worried about the server but the data is precious. We took out a set of 8 drives in raidz2 pool and attached them to another Solaris machine. The array was recognised and the data was intact for 6 hours, but after 6 hours the pool became unavailable in the middle. We power cycled the machine, without luck.

I will appreciate if you could please help us recover the data.

With the **zpool import* command we get the following error:

zpool import
  pool: p1z2
    id: 16004911417686972288
state: UNAVAIL
status: One or more devices are unavailable.
action: The pool cannot be imported due to unavailable devices or data.
        The pool may be active on another system, but can be imported using the '-f' flag.
config:
        p1z2                       UNAVAIL  corrupted data
          raidz2-0                 DEGRADED
            c0t5000C500959EC8DFd0  UNAVAIL  cannot open
            c0t5001B4D04D00A816d0  ONLINE
            c0t5001B4D04D1E6803d0  ONLINE
            c0t5000C50083375403d0  UNAVAIL  cannot open
            c0t5001B4D04D1F0807d0  ONLINE
            c0t5001B4D04D101812d0  ONLINE
            c0t5001B4D04D101817d0  ONLINE
            c0t5001B4D04D233806d0  ONLINE

device details:
        c0t5000C500959EC8DFd0    UNAVAIL          cannot open
        status: ZFS detected errors on this device.
                The device was missing.
        c0t5000C50083375403d0    UNAVAIL          cannot open
        status: ZFS detected errors on this device.
                The device was missing.

iostat -en

  ---- errors ---
  s/w h/w trn tot device
    0   0   0   0 c0t5001B444A4E76FA2d0
    0   0   0   0 c0t5001B4D04D090800d0
    0   0   0   0 c0t5001B4D04D08D801d0
    0   0   0   0 c0t5001B4D04D233802d0
    0   0   0   0 c0t5001B4D04D1E6803d0
    0   0   0   0 c0t5001B4D04D080804d0
    0   0   0   0 c0t5001B4D04D101805d0
    0   0   0   0 c0t5001B4D04D233806d0
    0   0   0   0 c0t5001B4D04D1F0807d0
    0   0   0   0 c0t5001B4D04D080810d0
    0   0   0   0 c0t5001B4D04D080811d0
    0   0   0   0 c0t5001B4D04D101812d0
    0   0   0   0 c0t5001B4D04D00B813d0
    0   0   0   0 c0t5001B4D04D080814d0
    0   0   0   0 c0t5001B4D04D1E6815d0
    0   0   0   0 c0t5001B4D04D00A816d0
    0   0   0   0 c0t5001B4D04D101817d0
ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • You appear to have two dead drives. – Michael Hampton Nov 23 '16 at 18:21
  • But It is raidz2 pool, with two drives dead it should survive – Sumit Saluja Nov 23 '16 at 18:37
  • 1
    OK, so you should try what it already suggested to you. – Michael Hampton Nov 23 '16 at 18:41
  • I am not able to import this pool, any suggestion how can I import pool – Sumit Saluja Nov 23 '16 at 18:50
  • Without any additional parameters, `zpool import` simply gives you a list of importable pools. Which clearly says that the pool is marked as not exported, and can be imported if you use `-f`. **What happens if you do that?** Also, keep in mind that the pool is *at* its redundancy threshold; ZFS will be hard pressed to correct for any errors or faults. You may want to **image the disks and work with copies.** – user Dec 09 '16 at 13:23

1 Answers1

2

I would not attempt to provide an answer without knowing more about the state of your pool. I recommend running the ZFS debug utility against the pool. It should provide additional information that can help determine why the host refuses to import the degraded pool (although it's telling you the data is corrupted, you may still be able to rewind to a point where you can recover).

Disclaimer: zdb is essentially an internal support tool for Oracle tech support. Using its various options without understanding their effect could make matters worse.

In the example below, the "-e" option tells it to operate on a pool that is not currently imported.

zdb -e p1z2 | tee /tmp/zdb.log

Please note, this can take a long, long, long time to run depending on the size of your pool and it's utilization. I just ran this against a healthy-but-exported 1.4TB pool I use with my Solaris 10U10 (+ latest CPU patches) system. This pool is 79% utilized by data, and zdb is still running its metadata checksums after 40 minutes (and I'm at the end of my day, so I'm not sticking around to see the finish). Output can be immense and that is why I am suggesting you log it to a file using tee.

Delinth
  • 21
  • 3
  • I ran this command and within seconds it has completed even though this pool was 97% full before died about 30TB. It shows output as : MOS Configuration: zdb: dmu_read(24) failed, errno 50: Checksum failure – Sumit Saluja Nov 28 '16 at 18:39