2

I'm using a RAID5 BTRFS system with three disks on my Ubuntu 18.04.2 server.

One of the drives started to give errors and the filesystem remounted as read-only continusly. I tried btrfs scrub but it did not work - the filesystem was remounted as read-only. Tried zero-log So I wanted to replace te faulty drive. I read that one could btrfs add a new drive and then delete the other one from the array, instead of using btrfs replace, which apparently had some issues.

So I tried that but got errors. Every time I tried to run btrfs device delete <device> the filesystem remounted as RO. Same thing when I tried to remove the drive by not even have it disconnected btrfs device delete missing. No Change. RO.

I did mount the system with -o usebackuproot and -o degraded.

I ran a chunk-recovery but the system shut down. Tried it multiple times since I now could not evern mount the array any more. I did however get one to finish after not running it with the faulty drive. But the chunk recovery failed and now I'm not even able to mount the RAID at all.

When mounting not outout states can't read superblock on <device>, but btrfs rescue super-recover states that All supers are valid, no need to recover.

When tryting to mount not dmesg output states.

[54605.499604] BTRFS info (device dm-7): bdev /dev/mapper/luks-* errs: wr 29664, rd 30074, flush 32, corrupt 0, gen 31

[54605.525654] BTRFS error (device dm-7): parent transid verify failed on 38977536 wanted 82072 found 82114

[54605.526827] BTRFS error (device dm-7): parent transid verify failed on 38977536 wanted 82072 found 82114

[54605.526847] BTRFS warning (device dm-7): failed to read fs tree: -5

[54605.553948] BTRFS error (device dm-7): open_ctree failed

Restore does work after connecting the faulty drive again with:

sudo btrfs restore -u 2 -vvv -ixm  /dev/mapper/luks-* /mnt/pnt

I have snapshots of one specific directory, of most importance - on the failed RAID array. I also have backup up (I think) everything to another drive.

I'm almost at the point wheere I'll do a btrfs check --repair. This is when I do some dry-runs:

$  sudo btrfs check /dev/mapper/luks-*

parent transid verify failed on 38813696 wanted 82115 found 82116
parent transid verify failed on 38813696 wanted 82115 found 82116
checksum verify failed on 38813696 found 4E67B99A wanted AA84042C
parent transid verify failed on 38813696 wanted 82115 found 82116
Ignoring transid failure
Checking filesystem on /dev/mapper/luks-*
UUID: *
Error: could not find extent items for root 258
ERROR: failed to repair root items: No such file or directory

$ sudo btrfs check --check-data-csum /dev/mapper/luks-*

parent transid verify failed on 38813696 wanted 82115 found 82116
parent transid verify failed on 38813696 wanted 82115 found 82116
checksum verify failed on 38813696 found 4E67B99A wanted AA84042C
parent transid verify failed on 38813696 wanted 82115 found 82116
Ignoring transid failure
Checking filesystem on /dev/mapper/luks-*
UUID: *
Error: could not find extent items for root 258
ERROR: failed to repair root items: No such file or directory

$ sudo btrfs check --init-extent-tree /dev/mapper/luks-*

Checking filesystem on /dev/mapper/luks-*
UUID: *
Creating a new extent tree
Failed to find [38879232, 168, 16384]
btrfs unable to find ref byte nr 38895616 parent 0 root 1  owner 1 offset 0
Failed to find [38879232, 168, 16384]
btrfs unable to find ref byte nr 38912000 parent 0 root 1  owner 0 offset 1
parent transid verify failed on 38944768 wanted 82115 found 82116
Ignoring transid failure
checking extents
parent transid verify failed on 38977536 wanted 82072 found 82114
Ignoring transid failure
leaf parent key incorrect 38977536
bad block 38977536
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 missing its root dir, recreating
Failed to find [39043072, 168, 16384]
btrfs unable to find ref byte nr 1792131072 parent 0 root 4  owner 1 offset 0
Failed to find [39043072, 168, 16384]
btrfs unable to find ref byte nr 39878656 parent 0 root 4  owner 0 offset 1
Failed to find [22020096, 168, 16384]
btrfs unable to find ref byte nr 22036480 parent 0 root 3  owner 0 offset 1
leaf free space ret -21995796, leaf data size 16283, used 22012079 nritems 50
leaf free space ret -21995796, leaf data size 16283, used 22012079 nritems 50
leaf free space incorrect 22020096 -21995796
extent-tree.c:1915: do_chunk_alloc: BUG_ON `ret` triggered, value -1
btrfs(+0x1f1e5)[0x563d259ea1e5]
btrfs(+0x1f255)[0x563d259ea255]
btrfs(+0x1f268)[0x563d259ea268]
btrfs(+0x22cea)[0x563d259edcea]
btrfs(btrfs_reserve_extent+0xf9)[0x563d259ede44]
btrfs(btrfs_alloc_free_block+0x5e)[0x563d259ee5df]
btrfs(__btrfs_cow_block+0xfe)[0x563d259e2c3c]
btrfs(btrfs_cow_block+0xc5)[0x563d259e31e1]
btrfs(btrfs_search_slot+0xfa)[0x563d259e5095]
btrfs(btrfs_insert_empty_items+0x82)[0x563d259e62cf]
btrfs(btrfs_insert_item+0x64)[0x563d259e6600]
btrfs(btrfs_insert_inode+0x37)[0x563d259f3c89]
btrfs(btrfs_make_root_dir+0xb4)[0x563d259f9de3]
btrfs(+0x15c39)[0x563d259e0c39]
btrfs(cmd_check+0x19fb)[0x563d25a1efe2]
btrfs(main+0x143)[0x563d259e1c87]
Aborted

Any ideas what to do? I'll try another chunk-recover now with the faulty drive connected and see if the server statys alive this time. I have the chunk-recovery log also if needed.

Daniel Holm
  • 131
  • 3
  • Go to the Btrfs mailing list and ask there for help, if this problem is still existent. RAID5 has many issues in the past and was just garbage, the status page of the Btrfs wiki at the moment lists it as "mostly ok", which is still really nothing I would like to have on important data/a production system. – Marc Stürmer Dec 22 '19 at 22:57

0 Answers0