I have an SFTP server where clients are constantly uploading large files. On a periodic basis, I want to copy all complete (fully uploaded) files to another machine for processing. I don't want to copy a file that is actively being written to. Is there a way to accomplish that? I am currently using rsync but I am open to switching to something else.
Asked
Active
Viewed 9,535 times
2 Answers
6
To check if a file is currently open (if a file is currently written is for sure open by some process) the standard way is to use lsof
:
if lsof /your/file > /dev/null; then echo "file currently open"; fi
You can use this snippet to filter find results for only not opened files and use them to feed rsync:
find . -type f -exec sh -c 'if ! lsof `readlink -f {}` > /dev/null; then echo `basename {}`; fi' \; | tr '\n' '\0' | rsync -avz --from0 --files-from=- ./ user@host:destination/
Some notes:
readlink -f
is needed to have full path of a file, lsof accept only full pathtr '\n' '\0'
emulate find-print0

lgaggini
- 611
- 4
- 8
-
Where I specify the path to rsync ? – Freedo Apr 04 '17 at 08:38
-
The last part of the command: `user@host:destination` to rsync to a remote path or simply `destination` to rsync to a local path. – lgaggini Apr 05 '17 at 04:52
-
any way to get relative path ? – overflowed Jun 10 '22 at 09:18
1
One challenge here is to determine whether the files are still begin written to. There is no perfect way to do this. I think the best you can do is to simply check the last-modified timestamp on the files, and only copy those files that have not been modified for a few minutes.
rsync
by itself cannot do this, but you can combine it with the find-command:
cd /path/to/directory/with/files
find ./ -type f -mmin +5 -print0 | rsync --archive --verbose --from0 --files-from=- ./ yourotherserver:targetdir/
To break down this command, it does two things:
- It uses
find ./ -type f -mmin +5 -print0
to identify all files that haven't been modified for at least 5 minutes. - It then feeds this list into
rsync
using the--from0
and--files-from
parameters. This will makersync
only consider those files thatfind
has identified.

olav
- 376
- 2
- 4