3

I am using rsync to copy data from a local LVM drive to a locally-mounted glusterfs mount. I'm trying to bring them in to sync and eventually cut over to just using the glusterfs mount. Here is the command that I am using:

rsync -av --inplace --no-whole-file /mnt/lvm-ext4/ /mnt/gluster

I currently have the initial data push done and am now trying to catch up with the new files that have been added. The LVM volume currently has around 14TB of data with around 25 million small files, but I think rsync is suffering with this many files.

While watching the output of this second run I see that most of the entries that rsync is outputting are directory entries, with a small set of actual files being moved. I believe this is because the directories have been touched and their atime has been updated thus requiring rsync to update them on the remote side. I've recently learned about the nodiratime mount option and have applied that today so hopefully that will help with future syncs.

Is there a way to have rsync ignore including these directory updates and only transfer the new/changed files or new directories? I've seen the options for --ignore-times and --checksum but they seem to be all inclusive.

Ken S.
  • 479
  • 5
  • 14
  • Answer in: https://stackoverflow.com/questions/35829263/ignore-subdirectories-timestamps-when-syncing-from-shell – Ferroao Jun 11 '19 at 18:25

1 Answers1

1

I don't think that your problem is really related to directory timestamp.

With so much data, rsync will simply need some time to discover changed files and to begin to transfer them. If the meantime it discover a directory a/m/ctime change it will replicate that change on the receiver side, but this should be almost instantaneous.

shodanshok
  • 47,711
  • 7
  • 111
  • 180