1

I had a pair of 3TB disks in a btrfs raid1 array.

One of these disks started failing (smartd shows bad sectors), and so I bought a pair of new 8TB drives to replace both disks in the array.

I replaced both with btrfs replace, and ran a btrfs balance afterwards - which fails on the following message:

[ 5063.136378] BTRFS error (device sdc): parent transid verify failed on 5153170751488 wanted 1433374 found 1417912
[ 5063.140428] BTRFS error (device sdc): parent transid verify failed on 5153170751488 wanted 1433374 found 1417912

Now, I've seen these messages precisely before replacing the disks, but now since both disks have been replaced I believe it has something to do with btrfs.

My data is fully backed up and the filesystem is online and working properly, but I cannot perform a balance due to this error. Running a scrub produces a small amount of uncorrectable errors, just as it did before I replaced the disks.

I was wondering how I could, perhaps:

  1. Find out which files are corrupted and restore them from a backup
  2. Reset the transaction on the filesystem to remove the errors
  3. Ignore the errors while balancing

...or any other reasonable solution.

Thanks!

dkd6
  • 155
  • 1
  • 9
  • It might be a bit late, but I want to explain a bit about btrfs which you not seem to know. In contrast to many other filesystems btrfs is able to do checksum not only for the metadata, but also for the data itself. Usually when btrfs detects any filesystems errors, it will automatically try to fix those errors. Fixing an error means to use a backup copy from DUP or RAID1. If no such copy is available, btrfs will just notices the system that a file is corrupt. Usually the system admin should now use a real backup to restore the lost data. What you have done, is ignoring data loses. – paladin Apr 25 '22 at 12:42
  • Next time when you see such error, it's not a btrfs error, but your data is corrupted and you should recover from backup, if possible. In contrast, ext4 and other filesystems only try to be happy around there metadata state. It's totally possible to lose data when using ext4 and not knowing it. btrfs on the other side, knows when it has lost data, that's an key advantage over ext4. – paladin Apr 25 '22 at 12:45
  • Hi, Thanks for clarifyng. What I eventually ended up doing was restoring the data from a backup onto the newly formatted filesystem. Looking at similar posts online, I could see that in most cases `dmesg` shows the path of the corrupt files discovered - yet in my case I could only see the`parent transid verify failed` errors, which I find confusing... – dkd6 Apr 27 '22 at 11:39

1 Answers1

1

I've made a few extra attempts to solve this and eventually only a clean filesystem reformat solved the issue.

Once I transefered the data out of the disks I tried two dangerous commands - btrfs check --init-csum-tree and a btrfs check --repair - neither of which did any harm but did not solve the issue.

After reformatting, I transferred the data back on the filesystem again, ran a btrfs filesystem balance and a btrfs filesystem scrub, and now everything is working again.

Cheers!

dkd6
  • 155
  • 1
  • 9