recovering a corrupt degraded btrfs RAID6, need to manually clear an inode

Question

I have a btrfs RAID 6. I have lousy hard drives, one failed, and another failed during recovery. Now I am without parity and cannot rebuild--a third drive is throwing read errors on a few sectors. Since I can't remove it, I overwrote the bad sectors with zero's using dd. Now during rebuild I get a few errors like this:
BTRFS info (device sdc): csum failed ino 257 off 3985240064 csum 2566472073 expected csum 1136819032
...then
kernel BUG at /build/linux-SMWX37/linux-3.12.9/fs/btrfs/extent_io.c:2082!
...the rebuild stops
I think btrfs doesn't know what do do without any parity for repair, so it throws an error and aborts.

I figured I could just delete the affected files, but inode 257 does not map to anything.
btrfs inspect-internal inode-resolve -v 257 /data
ioctl ret=-1, error: No such file or directory

Any suggestions for manually clearing inode 257, or otherwise repairing my filesystem?

Is this software RAID or hardware RAID? I would not have overwritten the bad sectors with zeros, but it may be too late to do anything about that now. Did the RAID rebuild complete? Corruption detected by the file system shouldn't prevent the RAID from proceeding with the rebuild. — kasperd, Aug 16 '14 at 19:21
It's btrfs raid, so it's a software raid but much different than md raid. The corruption does indeed prevent rebuild. No choice about the bad sectors, they were beyond recovery and were causing btrfs to panic--and it was only eight kilobytes in the middle of the drive. — Jacob Stoner, Aug 16 '14 at 19:35
I would have expected a file system level RAID to have more graceful degradation than that. I can't give you much advice here. But maybe it is possible to copy the good files from that file system to another file system. — kasperd, Aug 16 '14 at 19:53
agreed! seems btrfs raid is still somewhat incomplete. I may have to follow your advise, but was hoping to avoid it--it's a 12TB filesystem. — Jacob Stoner, Aug 16 '14 at 20:00
There is no stable version of BTRFS, this site is for professional sysadmins who inherently aim to use supportable and stable products - thus this question is not relevant to this site. — Chopper3, Aug 16 '14 at 20:14
@Chopper3 A rock stable file system on top of RAID6 at the block layer wouldn't have done much better in case of a triple disk failure. I have reasons to consider btrfs not ready for production yet, but this is not one of them. — kasperd, Aug 16 '14 at 20:23
This is pretty much unrecoverable. It's also not btrfs's fault. You need to do two things: Buy more reliable hard drives, and restore from your backups. — Michael Hampton, Aug 17 '14 at 13:18

score 4 · Accepted Answer · answered Aug 16 '14 at 22:11

If you have a RAID array that has three failing drives, there is low probability of getting the raidset back into service. Sorry.

I'm afriad to say your only alternative is to replace the failing disks, recreate the raidset, and then restore the information from your most recent backup set.

You realize the btrfs is still relatively experimental and thus I presume that you are prepared for this situation by keeping good backups.

If you want something more stable, I'd advise using the proven ext4 filesystem instead of more experimental btrfs.

After more study, I have to agree that this is the right answer. There are some great reasons to prefer btrfs, but this appears to be one of the edge cases that is not well handled. Fortunately I can still mount it read only and recover most of the data. — Jacob Stoner, Aug 17 '14 at 00:47

recovering a corrupt degraded btrfs RAID6, need to manually clear an inode

1 Answers1