I use robocopy to mirror a directory across a network to a machine. The destination machine is susceptible to power loss. Several times I've found that after a successful robocopy, some of the files are the right size/date, but are empty (all NUL bytes).
Robocopy copies a file by
- Creating a file in the destination
- Setting its mod time to the epoch
- Setting the size of the file
- Writing the real bytes into the file
- Setting the mod/create times to match source
So I was ending up with files that acted like 1-3 and 5 had happened, but 4 had not. Robocopy uses mod dates to decide which files to update, so files "corrupted" by power loss during the robocopy won't get fixed by a later successful robocopy run.
Intrigued, I replicated this behavior on a large scale without robocopy. Running this python code and pulling power after about 30000 files yields interesting results:
for i in range(100000):
if i and i%1000 == 0:
print(i)
with open(str(i),'wb') as f:
f.truncate(5)
with open(str(i),'r+') as f:
f.write('ccccc')
After power failure and reboot, I examined the files written. Thousands of them are the right size but have NUL contents. If I omit the first file open/truncate, all of them have the correct contents.
Any ideas what is going on here? Disabling write-caching on the drive doesn't help. The fact that the same problem doesn't happen for writing into a new file is confusing too.
The behavior makes robocopy pretty useless in power-loss scenarios. Even a power loss shortly after the robocopy fully completes can lead to the loss.