2

I use robocopy to mirror a directory across a network to a machine. The destination machine is susceptible to power loss. Several times I've found that after a successful robocopy, some of the files are the right size/date, but are empty (all NUL bytes).

Robocopy copies a file by

  1. Creating a file in the destination
  2. Setting its mod time to the epoch
  3. Setting the size of the file
  4. Writing the real bytes into the file
  5. Setting the mod/create times to match source

So I was ending up with files that acted like 1-3 and 5 had happened, but 4 had not. Robocopy uses mod dates to decide which files to update, so files "corrupted" by power loss during the robocopy won't get fixed by a later successful robocopy run.

Intrigued, I replicated this behavior on a large scale without robocopy. Running this python code and pulling power after about 30000 files yields interesting results:

for i in range(100000):
    if i and i%1000 == 0:
        print(i)
    with open(str(i),'wb') as f:
        f.truncate(5)
    with open(str(i),'r+') as f:
        f.write('ccccc')

After power failure and reboot, I examined the files written. Thousands of them are the right size but have NUL contents. If I omit the first file open/truncate, all of them have the correct contents.

Any ideas what is going on here? Disabling write-caching on the drive doesn't help. The fact that the same problem doesn't happen for writing into a new file is confusing too.

The behavior makes robocopy pretty useless in power-loss scenarios. Even a power loss shortly after the robocopy fully completes can lead to the loss.

aggieNick02
  • 2,557
  • 2
  • 23
  • 36
  • Playing more, I tried adding /J to my robocopy, which uses unbuffered IO (the destination create file call uses FILE_FLAG_NO_BUFFERING). This makes things better (only a few files with stale contents but updated modtimes), but doesn't fix the problem completely. – aggieNick02 Oct 24 '19 at 19:20

0 Answers0