2

I'm working on a little backup script using robocopy and hardlinks. My aim is to have differential backups using similar concept as rsync's --link-dest without having to resort to 3rd party tools. For those unfamiliar, the concept is that the first backup is taken as full, and the subsequent backups link unchanged files from the previous backup using hardlinks and copy only the files which have changed, leaving you with multiple full point-in-time backup directories, while occupying much less disk space (due to the hardlinking) than full backups would.

My problem is, that whenever I hardlink a file, robocopy reports the hardlinked file as modified, even though I presumably haven't changed it in any way. At first I've suspected that access time could have changed, but that's not the case, as the file gets reported as modified only when I'm trying to copy ACLs or owner information (/COPY:S or /COPY:O).

This is how I'm attempting to do it:

  1. Have a source directory structure you want to copy

    mkdir C:\BackupSource
    type nul > C:\BackupSource\myfile.txt
    
  2. Have a target root directory for backup

    mkdir C:\BackupTarget
    
  3. Run first full backup
    I'm using more flags in the script but these are enough to trigger the behavior.

    robocopy C:\BackupSource C:\BackupTarget\1 /MIR /COPY:DATSO
    
  4. Check that the files are up-to-date
    (not part of the script, just for sake of the question)

    robocopy C:\BackupSource C:\BackupTarget\1 /MIR /COPY:DATSO /L
    

    This tells me that the file has been skipped, which makes sense because at this point it hasn't been modified.

  5. Create a directory structure for subsequent backup
    This will create an empty directory structure identical to the previous backup. I'm traversing the previous backup and creating the hardlinks in the next step.

    robocopy C:\BackupTarget\1 C:\BackupTarget\2 /MIR /CREATE /DCOPY:DAT /XF *
    
  6. Hardlink the files
    This part is obviously bit more complicated in the script as it descends into the subdirectories, but one file is enough for the purpose of the question.

    mklink /H C:\BackupTarget\2\myfile.txt C:\BackupTarget\1\myfile.txt
    

Now, at this point, I would think I haven't done any modification to the files in any of the directories. The documentation for CreateHardLinkW function (which is the one I'm really using in the script) says

The security descriptor belongs to the file to which a hard link points...
You cannot give a file different security descriptors on a per-hard-link basis...
This function does not modify the security descriptor of the file to be linked to ...

However, when I now check if the file has been modified, robocopy tells me that both the hardlink and the file from the original backup to which it points were modified.

robocopy C:\BackupSource C:\BackupTarget\1 /MIR /COPY:DATSO /L
robocopy C:\BackupSource C:\BackupTarget\2 /MIR /COPY:DATSO /L

I have checked the standard attributes, creation/modification/access times, ACLs and owner and they are the exact same. If I use just /COPY:DAT (which is the default), robocopy tells me that there were no modifications, which leads me to believe that hardlinking does change something in security descriptors.

When I run the command for subsequent backup as I would do normally

robocopy C:\BackupSource C:\BackupTarget\2 /MIR /COPY:DATSO

robocopy tells me it modified all my files, but I assume it has fixed just the security descriptors as the backup goes much faster than the first full one. Also when I check the hardlink with

fsutil hardlink list C:\BackupTarget\2\myfile.txt

it reports that the file is still a hardlink pointing to the same file which I have pointed it to before. It might seem that the problem is purely cosmetic, but when robocopy reports all files as modified in every backup, value of such logs is greatly diminished.

Why does robocopy think that a file's security descriptors have been modified when I point a hardlink to it? How can I prevent it or fix it after hardlinking and before the subsequent robocopy is started?

Disassembler
  • 246
  • 2
  • 7
  • One thought: how did you check the timestamps? Sometimes the cached timestamp information stored in the directory entry can be out of date, and this sounds like exactly the sort of scenario where that might happen. – Harry Johnston Nov 11 '18 at 22:52
  • By observing .NET FileSystemInfo and FileInfo objects properties (which matched with what explorer properties shown). The files are reported as modified only with `/COPY:S` and `/COPY:O` but not `/COPY:DAT`, that's why I think it's not the timestamps. – Disassembler Nov 12 '18 at 07:15
  • I tried this but could not duplicate – spacenomyous Nov 16 '18 at 18:53

0 Answers0