Using rsync to retrieve log files

Question

This question is hard to Google because of the existence of log files generated by rsync, which is not what I'm interested in.

What I'd like to do is to use rsync to retrieve log files from a set of servers. Normally when I want a bunch of personal files from my home server to my laptop I'll do something along the lines of:

rsync --rsh='ssh' -av --progress --partial user@host:source destination

To say retrieve my vacation videos to show my family. But I would like to now use rsync to retrieve files from servers to a backup location. Since my log files are append only, and frequently written to, is there a way of making rsync smart enough to "follow" the changes rather than revalidating the initial million lines over and over again? If not, is there another application that people use for this? I don't want to use a UDP connection because I want the ability to handle spikes in traffic without losing data and TCP will be too slow, so I'd like to do local files with backups that catch up during periods of inactivity.

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

I'm having the same problem. I wanted to build a log centralization using rsync. The problem is, whenever source log files are getting updates, rsync transfers whole log file to a remote server and deletes the old one. This behavior is so exhaustive, and it made me lose my mind.

Turns out, rsync has a --append flag which allows rsync to update only "the new parts" of a log file. (I tested only on log files)

From man page:

--append append data onto shorter files

From explainshell.com:

--append

This causes rsync to update a file by appending data onto the end of the file, which presumes that the data that already exists on the receiving side is identical with the start of the file on the sending side. If a file needs to be transferred and its size on the receiver is the same or longer than the size on the sender, the file is skipped. This does not interfere with the updating of a file’s non-content attributes (e.g. permissions, ownership, etc.) when the file does not need to be transferred, nor does it affect the updating of any non-regular files. Implies --inplace, but does not conflict with --sparse (since it is always extending a file’s length).

For example

rsync -avz --append /source/dir /dest/dir

It won't re-validate the entire file but it appends the changes only.

Cool! I haven't had a chance to test it but if it works then great! — zachaysan, Jan 08 '20 at 15:45
@zachaysan I'm giving some context. We wanted to tail log files of apps for developers so they don't have to log on to the servers directly. Before using this `--append` thing, we were unable to do 'live tailing', we had to close the `tail -f` and run it again. Because rsync deletes the old log file and replaces it with the new one. But now we can do real time log tailing using `--append` of rsync. It works, at least for us. — Annahri, Jan 08 '20 at 21:25
Thanks for clarifying. I think it will work for the original question, so thanks for submitting this answer and subsequent clarification. — zachaysan, Jan 09 '20 at 02:37

score 1 · Answer 2 · answered Aug 27 '15 at 09:04

1

You could consider using logrotate to split them into smaller files automatically.

Use the dateext option in /etc/logrotate.conf to make the rotated logs have a consistent filename (i.e. not being renamed after each rotate which is the default functionality but doesn't play well with rysnc).

answered Aug 27 '15 at 09:04

Eborbob

1,905
1
15
30

Thanks for the idea. It's not bad in general, but I was hoping to have a more real time solution in the average case, but I suppose I could break it up into lots of little files. – zachaysan Aug 27 '15 at 15:27
@zachaysan What do you mean by `a more real time solution`? – Eborbob Aug 27 '15 at 16:17
I'd like a way I can tell rysnc that the beginning of a file will never change. For very large files I don't want the rsync / the operating system to have to read from the beginning of the file just to add that last 0.05% since the last time it checked. For example, if I do `less +G` I can read from the end of a large file extremely quickly. I want rsync to do something similar. – zachaysan Aug 27 '15 at 20:35
If you rotate the logs and they're only say 20MB each, is that a problem? Would you notice the difference with it processing the start of such a file as opposed to skipping the first 19MB? – Eborbob Aug 28 '15 at 09:17

zachaysan · Answer 3 · 2020-01-08T15:45:38.093

2020 Edit:

I haven't had a chance to validate that rsync can now support this, but it appears that someone else mentioned the --append flag. I don't want to mislead people if it works, but I'm keeping my original comment here for now.

Original:

After reading through the source code for rsync, I've determined that:

There is no way to currently set a flag to do this.
The way rsync works is that is reads the file and calculates a hash for chunks of the file and sends them back to the calling process which matches the hashes. But it does look like it does a full file read each time, even though it doesn't eat up that much bandwidth.

For now I'll use the logrotate solution, but I'm leaving this question unanswered because I'm still convinced that there must be a better solution that just works out of the box.

Using rsync to retrieve log files

3 Answers3