0

Collaboration partners of our company will upload files (typically a few GB in size) to a directory on one of our external servers.
From that directory, I will move them to our internal network, where they will be ultimately consumed by our analysts.

So far, the upload is solved with a chroot jail and works fine, same for for what happens when the files arrive in our internal network.

The transfer to our internal network, however, is problematic. I simply rsync them with --remove-source-files and a find to delete the empty directories.
Thing is, the cron job polling interval on the directory needs to be low (we'd prefer every minute), while the transfer time is fairly high (our office DSL is slow), and obviously we don't want to start uploading the same file every 5 minutes.
Is there a good solution for this problem? I could move the contents to a temporary directory, and then rsync from there, but I'm feeling like a more elegant solution exists.

Nils Landt
  • 143
  • 4
  • As a minor-aside, how are you synchronising the partner<->server side of things i.e. how do you ensure that you don't start rsyncing-and-deleting before they've finished uploading? – nickgrim Sep 18 '12 at 12:45
  • That's part of the upload script I prepared for them - they upload to `dir a`, then when the upload is done, the script will move that do `dir b`, which is the watched dir. Since this is a move on the same partition, it's nearly instantaneous – Nils Landt Sep 18 '12 at 17:28

2 Answers2

4

lockrun was designed specifically for this use case:

$ crontab -l
* * * * * lockrun --lockfile=/tmp/.partner-sync -- rsync src/ dest/

That will trigger every minute but only run if /tmp/.partner-sync is not already locked.

  • And, if you have problem with bandwitdth, you can use `--bwlimit` switch for rsync... – Jan Marek Sep 18 '12 at 11:58
  • I think this almost-but-not-quite solves the problem. I think the OP is hoping that if `file2` is ready to sync whilst `file1` is still syncing, then the transfer of `file2` will start immediately (rather than "on the next poll after `file1` has finished") – nickgrim Sep 18 '12 at 12:50
  • @nickgrim This is actually perfect, if file2 were to start rsyncing immediately, it would just congest our pipes, meaning file1 would be finished much later. – Nils Landt Sep 18 '12 at 17:29
0

In the end, I went with flock simply because lockrun (which Darren Chamberlain mentioned) does not have a maintained deb package.

Nils Landt
  • 143
  • 4