0

EDIT: I see someone has voted to close this as "too broad a question", but the last time I posted asking how to do something, I was asked to provide more detail about what and why I wanted to do it. So... can't win! I'm only asking one question: "how to action remote copy of folder when local copy changes". I'm not asking how to rename, renumber or make zip files, just explaining that this is what I need to do as part of the copy. I really can't think of a less detailed way of asking this without the obvious (but wrong) answer being "just use rsync".

I want to automatically copy folders based on finished upload activity in a Dropbox folder (other services could add files too) on Ubuntu 18.04. I need to:

  • Leave the source folder untouched.
  • Numerically prefix copied filenames if not already - (find . -name '[[:digit:]]*.mp3' etc)
  • Cleanse filenames of apostrophes (using, eg, "detox").
  • Create a zip of the folder on the remote side.
  • Re-copy and re-create zip folder on remote copy if anything changes in source folder in the future.

Example: SOURCE folder of 20190203

   apostrophe's.mp3
   track01.mp3
   zebra 4.mp3

REMOTE folder of 20190203 (after processing)

   01-apostrophes.mp3
   02-track01.mp3
   03-zebra4.mp3
   20190302.zip

If remote user was to add chickens.mp3 and remove apostrophe's.mp3 in the source folder a month later, the process would would update the remote folder by re-copying and renaming the folders, and rebuild the zip file, automatically.

All single files likely to be uploaded are less than 10Mb, so even the slowest connection isn't likely to take more than 15 minutes to upload any one file, but it could take up to 45 minutes to upload the whole folder.

I can't check for changes based on folder size, number of files or modification date, because the action of added the zip file to the remote folder will change all of the those.

Currently, I have an hourly crontab running a script containing this:

SCANDIRS="$(find $BASEDIR -type f -mmin +15 -mmin -45 -printf "%h\\n" | sort -u | xargs -n 1 realpath)"

It then loops through scandirs and does the magic, but this probably has lots of problems I've not foreseen, can only run once an hour, and doesn't allow older folders to be updated.

I know rsync -av --delete with a regular crontab would work if it was just files, but I'm totally stuck with how to do what I want. The copied folders would reside on the same local filesystem (then get s3 synced remotely, if you want to know!).

I think inotifywait might be a solution, but I'm unsure how to handle the "wait until folder is quiescent for a certain amount of time but allow future updates at any time" problem.

Thank you.

digitaltoast
  • 659
  • 7
  • 23
  • If you have space to keep a staging copy of the folder locally, you can do a local rsync, and then only recreate your zip if anything changed. Otherwise, you could maintain a file of md5 checksum and use it the same way. Or possibly even just `find -newer timestamp`. – jhnc Feb 03 '19 at 18:54
  • It's possible that I didn't explain the above question clearly, but none of the above are possible for the reasons I've outlined, particularly in regards to the changing timestamp/added items in the copy of the folder. – digitaltoast Feb 03 '19 at 19:06
  • 1
    So you have no way of querying the state of SOURCE? I'm not talking about querying the state of REMOTE. Isn't the `find -mmin` in your question querying SOURCE? – jhnc Feb 03 '19 at 19:16
  • You've given me an idea - after I generate the zip file on remote, I then regularly do a `find -newer` on any SOURCE file newer than the zip file was created. So how about, for each folder, if no remote.zip OR anysourcefile newer than remote.zip, then `rsync -av --delete` and rerun remote script. Hmmm, I can't see why that wouldn't work. Have I understood correctly? If so, many thanks! That's a great idea! – digitaltoast Feb 03 '19 at 19:28
  • 1
    Something like that. I was thinking more `do_sync; touch timestamp;` / `if find source -newer timestamp | grep -qc .; then do_sync_changes; fi`. Then you also pick up deletions in source (because the directory time itself should change). – jhnc Feb 03 '19 at 19:39
  • Ah, of course, my way doesn't account for deleted files! Nice catch there - many, many thanks! BTW, it looks like `grep -qc .` looks like it might give a number if any folders (single dot files) match, but I'm not getting anything from it, and out of interest, what advantage does that give over `find source -type d -newer timestamp` ? – digitaltoast Feb 03 '19 at 20:11
  • 1
    `-type d` will prevent find looking for changed files explicitly (`touch dir/file` won't change `dir`). May be better to use `-cnewer timestamp`. The grep stops results of find spewing onto stdout. – jhnc Feb 03 '19 at 20:30

1 Answers1

1

To summarise my comments, a simple bash script framework to check for changes might look something like:

SOURCE=/my/folder/to/check
WORK=/my/state/folder

is_initialised(){
    [ -f "$WORK/timestamp" ]
}

has_changed(){
    find "$SOURCE" -cnewer "$WORK/timestamp" | grep -q .
}

update_timestamp(){
    touch "$WORK/timestamp"
}

if ! is_initialised; then
    do_create_zip && update_timestamp || do_show_error
elif has_changed; then
    do_update_zip && update_timestamp || do_show_error
else
    echo "Nothing to do :)"
fi
jhnc
  • 11,310
  • 1
  • 9
  • 26