0

We had a disk in a raid go bad and we are suspicious another one might have minor errors. So to be safe we are trying to recover both disks using ddrescue. The various help pages recommend a two-to-three pass copy, first doing a no-split logged pass, then going back over the error sectors. e.g.

ddrescue --no-split --force /dev/sdc /dev/sdb logfile

ddrescue --direct --max-retries=3 /dev/sdc /dev/sdb logfile

then running the second again with --retrim added if there were still errors.

The problem is, I can see the initial pass occasionally slowing so I checked the dmesg log and I can see the same types of IO errors (Medium Error tag#25 Sens: Unrecovered read error) showing up in the system log, but ddrescue is not registering any errors in it's status.

UPDATE

ok, ddrescue is now showing 2 errors, but I'm showing more than 2 in the system log and none were showing in ddrescue when I saw the first few errors appear in the system log.

What I need to know is, if the syntax of the second command above only checks sectors ddrescue logged as bad, and if I should try to re-run the first command with some other flags such as --direct on that pass also. (I'm wondering if something in the drive firmware may be preventing ddrescue from seeing all errors)

SW

Addendum

Upon running a retrim, I'm monitoring it as errors in re-read passes lowered down to 285. It is now reading 291. I thought the idea of the latter passes was to recover error sectors specifically and did not expect that the number would do anything but go down. What am I missing here?

Leo Gallego
  • 1,893
  • 9
  • 17
Scott
  • 163
  • 1
  • 6

1 Answers1

0

The error counter seems to show the number of unreadable blocks (ranges of consecutive unreadable sectors in ddrescue parlance). If you have 3 unreadable sectors in a row and restore the middle one in a retry pass, that increments the counter by one. The kernel itself will retry several times if the disk does not respond within 60 seconds(?) and print multiple lines of error until it gives up on a sector. Although it only prints the sector number once AFAICT.

To display the number of faulty sectors issue ddrescuelog -l- <mapfile> | wc -l. It will print a list (-l) of all unreadable (-) sectors. (For disks with >512 byte sectors you may likely need to specify the sector size manually.)

As far as I understand --direct, it is disabled by default because not all systems would support it and cached reads are often faster. The latter is the reason for the split into two commands where the first is supposed to get 99.9% of the recoverable data as fast as possible. If either mode is faster than your write speed you can merge the two calls AFAICT.

mleise
  • 1
  • 1