I have an rsync backup script to transfer data between two Ubuntu servers (located in different countries). The data being backed up is quite large in terms of number of files. It is about 17GB in size totally. The script runs on the receiver server. So, it is basically a pull. Public-private key authentication used for login.
The script works fine; the backup has been happening successfully for many months now.
Lately, for the past 6 days or so, the backups have not been completed. The rsync process runs for about 45 minutes or so. And then just ends. I have no idea why it stops. From what I can see, it does not even complete building and scanning the file list. I have the cron output directed to a log file. In the log, all I see is: receiving file list ... done
. But I can see that nothing has been transferred into the backup destination.
If I run the script manually, after about 45 minutes, I just see this: ./sync.sh: line 51: 9078 Killed $RSYNC $OPTIONS $SOURCE $DESTINATION
How and where do I see the reason for the failure? How do I know which server is actually killing the process, the sender or the receiver?
The pulling machine (where the script runs) is a low-end-box. It is a KVM VM with 256MB of RAM. So, I am wondering if the building of the file structure is taking up too much RAM, thus causing an OOM error. But how do I check if this is the case? Moreover, there has been no significant increase in files for it to cause the sudden failing.
Any tips would be appreciated.
Thanks.
Update 1
As suggested by @APZ, I added a couple more verbose flags (3 in total) and ran the script manually, redirecting the output to a file. Here is the output at the end:
(.... lots of file names....)
received 5795917 names
done
recv_file_list done
get_local_name count=5795917 /storage/ <======== Reached here after about 40 minutes. Was stuck here for about 10 minutes or so.
[Receiver] _exit_cleanup(code=14, file=main.c, line=788): about to call exit(14)
rsync: fork failed in do_recv: Cannot allocate memory (12)
rsync error: error in IPC code (code 14) at main.c(788) [Receiver=3.0.9]
To answer @TimHaegele, as far as I know, the VM host (Prometeus / IperWeb) does not do any limiting of CPU, IO or anything. I could ask them, though. They are extremely highly rated.
My Ubuntu installation on the VM has 512 MB swap configured. Maybe I can increase that to say 2 GB or so? Disk space is not a problem.
When rsync is running, this is the output of free -m
:
total used free shared buffers cached
Mem: 239 236 2 0 0 3
-/+ buffers/cache: 232 7
Swap: 511 510 1
Based on this evidence, would it still make a difference to change the SSH Daemon settings, as suggested?
Update 2
The consensus seems to be that low memory is the issue. So, I added a new swap file of 2GB and activated it. So, now I have a total of 2.5 GB of swap.
Then, I ran the script again (manually). This time, it ran for more than 90 minutes. It was transferring the files by this time. But then suddenly, the process quit. In the logs, I see that it terminated with the following error:
Invalid packet at end of run (4330026) [sender]
[generator] _exit_cleanup(code=12, file=io.c, line=1532): about to call exit(12)
rsync error: protocol incompatibility (code 2) at main.c(695) [sender=3.0.7]
rsync: writefd_unbuffered failed to write 23 bytes to socket [generator]: Broken pipe (32)
rsync error: error in rsync protocol data stream (code 12) at io.c(1532) [generator=3.0.9]
[receiver] _exit_cleanup(code=19, file=main.c, line=1316): about to call exit(19)
rsync error: received SIGUSR1 (code 19) at main.c(1316) [receiver=3.0.9]
As you can see, the sender machine has 3.0.7 and the receiver (puller) has 3.0.9 . I don't quite get what the error is.
Meanwhile, I saw @APZ's comment and I have modified my script to replace --delete-after
with --delete-delay
. I am running it again now. Will get back with updates.
Update 3
Adding more swap and using --delete-delay
instead of --delete-after
seems to have done the trick. The regular cron job seems to be running properly as well.
Also, I have followed this article to make rsync run with sudo on the sending machine. This has also removed the Permission denied (13)
warnings during the transfer.
Thanks for the help, everyone.
P.S.: Everybody who participated in this Q&A gave helpful suggestions. Unfortunately, I can only mark one correct answer.