I have a 10TB BTRFS volume made of 7 whole-disk volumes (no partitions) in a JBOD server with each volume being a physical drive mounted as single-drive RAID0*. The BTRFS volume with the 7 drives was created as RAID1 data, metadata and system, meaning that there is only 5TB of usable space.
The set-up had some power outages and the volume is now corrupted.
I started a btrfs scrub
that took 10 hours, it recovered some errors but still has unrecoverable errors. Here's the log :
scrub status:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:3|data_extents_scrubbed:43337833|tree_extents_scrubbed:274036|data_bytes_scrubbed:2831212044288|tree_bytes_scrubbed:4489805824|read_errors:0|csum_errors:0|verify_errors:0|no_csum:45248|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:2908834758656|t_start:1548346756|t_resumed:0|duration:33370|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:4|data_extents_scrubbed:6079208|tree_extents_scrubbed:57260|data_bytes_scrubbed:397180661760|tree_bytes_scrubbed:938147840|read_errors:0|csum_errors:0|verify_errors:0|no_csum:5248|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:409096683520|t_start:1548346756|t_resumed:0|duration:6044|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:5|data_extents_scrubbed:13713623|tree_extents_scrubbed:63427|data_bytes_scrubbed:895829155840|tree_bytes_scrubbed:1039187968|read_errors:67549319|csum_errors:34597|verify_errors:45|no_csum:40128|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:67546631|corrected_errors:37330|last_physical:909460373504|t_start:1548346756|t_resumed:0|duration:20996|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:6|data_extents_scrubbed:44399586|tree_extents_scrubbed:267573|data_bytes_scrubbed:2890078298112|tree_bytes_scrubbed:4383916032|read_errors:0|csum_errors:0|verify_errors:0|no_csum:264000|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:2908834758656|t_start:1548346756|t_resumed:0|duration:35430|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:7|data_extents_scrubbed:13852777|tree_extents_scrubbed:0|data_bytes_scrubbed:898808254464|tree_bytes_scrubbed:0|read_errors:0|csum_errors:0|verify_errors:0|no_csum:133376|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:909460373504|t_start:1548346756|t_resumed:0|duration:20638|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:8|data_extents_scrubbed:13806820|tree_extents_scrubbed:0|data_bytes_scrubbed:896648761344|tree_bytes_scrubbed:0|read_errors:0|csum_errors:0|verify_errors:0|no_csum:63808|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:909460373504|t_start:1548346756|t_resumed:0|duration:20443|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:9|data_extents_scrubbed:5443823|tree_extents_scrubbed:0|data_bytes_scrubbed:356618694656|tree_bytes_scrubbed:0|read_errors:0|csum_errors:0|verify_errors:0|no_csum:0|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:377958170624|t_start:1548346756|t_resumed:0|duration:3199|canceled:0|finished:1
I then unmounted the volume and did btrfs check --repair
with this output :
Checking filesystem on /dev/sdb
UUID: 1ea7ff96-0c60-46c3-869c-ae398cd106a8
checking extents [o]
cache and super generation don't match, space cache will be invalidated
checking fs roots [o]
checking csums
checking root refs
found 4588612874240 bytes used err is 0
total csum bytes: 4474665852
total tree bytes: 5423104000
total fs tree bytes: 734445568
total extent tree bytes: 71221248
btree space waste bytes: 207577944
file data blocks allocated: 4583189770240
referenced 4583185391616
and now I can't mount the volume with mount -a
, with this output :
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
inspecting dmesg, messages during scrub were outputted :
[37825.838303] BTRFS error (device sde): bdev /dev/sdf errs: wr 67699124, rd 67694614, flush 0, corrupt 34597, gen 45
[37826.202827] sd 1:1:0:4: rejecting I/O to offline device
Later mounting errors in dmesg are as follows:
[pciavald@Host-005 ~]$ sudo mount -a
[63078.778765] BTRFS info (device sde): disk space caching is enabled
[63078.778771] BTRFS info (device sde): has skinny extents
[63078.779882] BTRFS error (device sde): failed to read chunk tree: -5
[63078.790696] BTRFS: open_ctree failed
[pciavald@Host-005 ~]$ sudo mount -o recovery,ro /dev/sdb /data
[75788.205006] BTRFS warning (device sde): 'recovery' is deprecated, use 'usebackuproot' instead
[75788.205012] BTRFS info (device sde): trying to use backup root at mount time
[75788.205016] BTRFS info (device sde): disk space caching is enabled
[75788.205018] BTRFS info (device sde): has skinny extents
[75788.206382] BTRFS error (device sde): failed to read chunk tree: -5
[75788.215661] BTRFS: open_ctree failed
[pciavald@Host-005 ~]$ sudo mount -o usebackuproot,ro /dev/sdb /data
[76171.713546] BTRFS info (device sde): trying to use backup root at mount time
[76171.713552] BTRFS info (device sde): disk space caching is enabled
[76171.713556] BTRFS info (device sde): has skinny extents
[76171.714829] BTRFS error (device sde): failed to read chunk tree: -5
[76171.725735] BTRFS: open_ctree failed
From the scrub log, it seems that all unrecoverable errors are located on a single hard drive, devid 5
. Also, the errors seem to be related to drive /dev/sdf
from the dmesg messages. The scrub log indicates all errors on device 1ea7ff96-0c60-46c3-869c-ae398cd106a8:5
.
*: I know using BTRFS not directly on physical drives but on volumes managed by a physical RAID driver is not the best option but I had no choice. Each drive inserted in the array is formatted as a single RAID0 drive, which makes it visible to the OS. These logical drives were formatted as full-volume BTRFS drives and added to the BTRFS device with duplication of data and metadata.
EDIT: I went down to the server to reboot it to a newer kernel and noticed that the drive with the errors /dev/sdf
had the fail state LED on. I shut down the server, restarted the JBOD and the server, and it turned green. The volume is currently mounted correctly and I relaunched the scrubbing. After 6 minutes, status already had errors but there is no indication whether they could be corrected:
scrub status for 1ea7ff96-0c60-46c3-869c-ae398cd106a8
scrub started at Fri Jan 25 11:53:28 2019, running for 00:06:31
total bytes scrubbed: 243.83GiB with 3 errors
error details: super=3
corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
When scrub ended after 8 hours this time, the output is as follows :
scrub status for 1ea7ff96-0c60-46c3-869c-ae398cd106a8
scrub started at Fri Jan 25 11:53:28 2019 and finished after 07:59:20
total bytes scrubbed: 8.35TiB with 67549322 errors
error details: read=67549306 super=3 csum=13
corrected errors: 2701, uncorrectable errors: 67546618, unverified errors: 0
The new log for that scrub is as follows :
1ea7ff96-0c60-46c3-869c-ae398cd106a8:3|data_extents_scrubbed:43337833|tree_extents_scrubbed:273855|data_bytes_scrubbed:2831212044288|tree_bytes_scrubbed:4486840320|read_errors:0|csum_errors:0|verify_errors:0|no_csum:45248|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:2908834758656|t_start:1548413608|t_resumed:0|duration:26986|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:4|data_extents_scrubbed:6079208|tree_extents_scrubbed:57127|data_bytes_scrubbed:397180661760|tree_bytes_scrubbed:935968768|read_errors:0|csum_errors:0|verify_errors:0|no_csum:5248|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:409096683520|t_start:1548413608|t_resumed:0|duration:6031|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:5|data_extents_scrubbed:13713623|tree_extents_scrubbed:63206|data_bytes_scrubbed:895829155840|tree_bytes_scrubbed:1035567104|read_errors:67549306|csum_errors:13|verify_errors:0|no_csum:40128|csum_discards:0|super_errors:3|malloc_errors:0|uncorrectable_errors:67546618|corrected_errors:2701|last_physical:909460373504|t_start:1548413608|t_resumed:0|duration:14690|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:6|data_extents_scrubbed:44399652|tree_extents_scrubbed:267794|data_bytes_scrubbed:2890081705984|tree_bytes_scrubbed:4387536896|read_errors:0|csum_errors:0|verify_errors:0|no_csum:264832|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:2908834758656|t_start:1548413608|t_resumed:0|duration:28760|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:7|data_extents_scrubbed:13852771|tree_extents_scrubbed:0|data_bytes_scrubbed:898807992320|tree_bytes_scrubbed:0|read_errors:0|csum_errors:0|verify_errors:0|no_csum:133312|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:909460373504|t_start:1548413608|t_resumed:0|duration:14372|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:8|data_extents_scrubbed:13806827|tree_extents_scrubbed:0|data_bytes_scrubbed:896649023488|tree_bytes_scrubbed:0|read_errors:0|csum_errors:0|verify_errors:0|no_csum:63872|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:909460373504|t_start:1548413608|t_resumed:0|duration:14059|canceled:0|finished:1
1ea7ff96-0c60-46c3-869c-ae398cd106a8:9|data_extents_scrubbed:5443823|tree_extents_scrubbed:3|data_bytes_scrubbed:356618694656|tree_bytes_scrubbed:49152|read_errors:0|csum_errors:0|verify_errors:0|no_csum:0|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:377991725056|t_start:1548413608|t_resumed:0|duration:3275|canceled:0|finished:1
The same volume has uncorrectable errors, so I tried listing the btrfs volumes and devid 5
is missing from the list :
[pciavald@Host-001 ~]$ sudo btrfs fi show /data
Label: 'data' uuid: 1ea7ff96-0c60-46c3-869c-ae398cd106a8
Total devices 7 FS bytes used 4.17TiB
devid 3 size 2.73TiB used 2.65TiB path /dev/sdd
devid 4 size 465.73GiB used 381.00GiB path /dev/sde
devid 6 size 2.73TiB used 2.65TiB path /dev/sdb
devid 7 size 931.48GiB used 847.00GiB path /dev/sdc
devid 8 size 931.48GiB used 847.00GiB path /dev/sdg
devid 9 size 931.48GiB used 352.03GiB path /dev/sdh
*** Some devices missing
Here all devices are listed except devid 5
and /dev/sdf
so I guess the broken drive is this one. Because data is duplicated, i should be able to delete this device and rebalance the setup, so I tried it :
[pciavald@Host-001 ~]$ sudo btrfs device delete /dev/sdf /data
ERROR: error removing device '/dev/sdf': No such device or address
How can I properly delete that device ?
EDIT 2: I went to IRC freenode #btrfs in order to get help, and did the following investigation. On the usage we can see the overall system with data secured over 2 different drives:
[pciavald@Host-001 ~]$ sudo btrfs fi usage /data
Overall:
Device size: 9.55TiB
Device allocated: 8.49TiB
Device unallocated: 1.06TiB
Device missing: 931.48GiB
Used: 8.35TiB
Free (estimated): 615.37GiB (min: 615.37GiB)
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,RAID1: Size:4.24TiB, Used:4.17TiB
/dev/sdb 2.64TiB
/dev/sdc 847.00GiB
/dev/sdd 2.64TiB
/dev/sde 380.00GiB
/dev/sdf 846.00GiB
/dev/sdg 847.00GiB
/dev/sdh 352.00GiB
Metadata,RAID1: Size:6.00GiB, Used:5.05GiB
/dev/sdb 5.00GiB
/dev/sdd 5.00GiB
/dev/sde 1.00GiB
/dev/sdf 1.00GiB
System,RAID1: Size:64.00MiB, Used:624.00KiB
/dev/sdb 64.00MiB
/dev/sdd 32.00MiB
/dev/sdh 32.00MiB
Unallocated:
/dev/sdb 85.43GiB
/dev/sdc 84.48GiB
/dev/sdd 85.46GiB
/dev/sde 84.73GiB
/dev/sdf 84.48GiB
/dev/sdg 84.48GiB
/dev/sdh 579.45GiB
On btrfs dev stats /data
we can see that all errors are located on /dev/sdf
indicating that the scrub's unrecoverable errors were not due to errors in the mirror copy of the broken data but instead because of the OS not being able to read/write correctly on the defective drive:
[/dev/sdd].write_io_errs 0
[/dev/sdd].read_io_errs 0
[/dev/sdd].flush_io_errs 0
[/dev/sdd].corruption_errs 0
[/dev/sdd].generation_errs 0
[/dev/sde].write_io_errs 0
[/dev/sde].read_io_errs 0
[/dev/sde].flush_io_errs 0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0
[/dev/sdf].write_io_errs 135274911
[/dev/sdf].read_io_errs 135262641
[/dev/sdf].flush_io_errs 0
[/dev/sdf].corruption_errs 34610
[/dev/sdf].generation_errs 48
[/dev/sdb].write_io_errs 0
[/dev/sdb].read_io_errs 0
[/dev/sdb].flush_io_errs 0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs 0
[/dev/sdc].read_io_errs 0
[/dev/sdc].flush_io_errs 0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sdg].write_io_errs 0
[/dev/sdg].read_io_errs 0
[/dev/sdg].flush_io_errs 0
[/dev/sdg].corruption_errs 0
[/dev/sdg].generation_errs 0
[/dev/sdh].write_io_errs 0
[/dev/sdh].read_io_errs 0
[/dev/sdh].flush_io_errs 0
[/dev/sdh].corruption_errs 0
[/dev/sdh].generation_errs 0
I've ordered a new 1TB drive to replace /dev/sdf
and will write an answer to this question once i've managed replacing it.