I have a directory of images, currently at ~117k files for about 200 gig in size. My backup solution vomits on directories of that size, so I wish to split them into subdirectories of 1000. Name sorting or type discrimination is not required. I just want my backups to not go nuts.
From another answer, someone provided a way to move files into the split up configuration. However, that was a move, not a copy. Since this is a backup, I need a copy.
I have three thoughts:
1. Files are added to the large directory with random filenames, so alpha sorts aren't a practical way to figure out deltas. Even using a tool like rsync, adding a couple hundred files at the beginning of the list could cause a significant reshuffle and lots of file movement on the backup side.
2. The solution to this problem is to reverse the process: Do an initial file split, add new files to the backup at the newest directory, manually create a new subdir at the 1000 file mark, and then use rsync to pull files from the backup directories to the work area, eg rsync -trvh <backupdir>/<subdir>/ <masterdir>
.
3. While some answers to similar questions indicate that rsync is a poor choice for this, I may need to do multiple passes, one of which would be via a slower link to an offsite location. The performance hit of using rsync and its startup parsing is far superior to the length of time reuploading the backup on a daily basis would take.
My question is:
How do I create a script that will recurse into all 117+ subdirectories and dump the contained files into my large working directory, without a lot of unnecessary copying?
My initial research produces something like this:
#!/bin/bash
cd /path/to/backup/tree/root
find . -type d -exec rsync -trvh * /path/to/work/dir/
Am I on the right track here?
It's safe to assume modern versions of bash, find, and rsync.
Thanks!