17

It takes forever to back up. Before we can trust btrfs or ZFS to backup incremental snapshots, wouldn't it be nice if there was a daemon that used inotify to keep track of which files had actually changed so backups would run more quickly? Where is this program?

How do I backup my Linux box without having to crawl the whole filesystem every time? I would like a program that would detect the new or changed photos, source code, etc. and queue them to be copied over to my NAS.

joeforker
  • 2,399
  • 4
  • 26
  • 35
  • You haven't really given much information here, especially regarding what you are currently using to back up, what type of data. There are several pieces of backup software which use journals to track changes to the filesystem, and then refer to that journal during a backup. What exactly are you trying to back up, to what sort of device/application, what is the nature of the data, and what is your current method for backing it up? – WerkkreW May 13 '09 at 21:18
  • @WerkkreW, I don't think what he is trying to backup is all that important. I suspect anything that allows for event-based backup would be interesting to learn about. The request isn't unusual, OSX has time machine which is event based. – Zoredache May 13 '09 at 21:52
  • I still feel the question needs a bit more clarification before it can be answered. – WerkkreW May 13 '09 at 23:09
  • 1
    @Zoredache, of course what I'm backing up is important! If it wasn't important... oh wait ;-) I want to back up my home directory without having an unusable machine for the n hours it would take for rsync to crawl the whole thing, when the only new stuff is a set of photos of my newborn baby. – joeforker May 14 '09 at 01:12
  • 3
    The question seems perfectly clear to me: Mac OS X's Time Machine backup is fast because it monitors fsevents to so it knows where to look when it runs a backup. Linux has inotify, a similar facility to fsevents, and yet Linux backup solutions that try to approximate Time Machine (e.g. dirvish) are miserably slow because they don't take advantage of inotify. Are there any that do? – bendin Aug 20 '09 at 16:31

7 Answers7

11

I answered my own question with "yum search inotify". It's called lsyncd and it's hosted on google code.

Unfortunately it looks like it always runs a full rsync first, so it still wouldn't help me if my computer was not turned on for more than 14 hours at a time.

Lsyncd uses rsync to synchronize local directories with a remote machine running rsyncd. Lsyncd watches multiple directories trees through inotify. The first step after adding the watches is to rsync all directories with the remote host, and then sync single file by collecting the inotify events. So lsyncd is a light-weight live mirror solution that should be easy to install and use while blending well with your system. See lsyncd --help for detailed command line options.

joeforker
  • 2,399
  • 4
  • 26
  • 35
  • That's an interesting link, I had even thought of implementing that myself. But why does it say on the page that it will retransfer large files for each change? I thought rsync itself would already avoid that? – Hanno Fietz Jul 02 '09 at 13:06
  • rsync still has to re-read the entire file on both ends to transfer it as efficiently as it can. The lsyncd documentation intends to say that this may not be efficient for large files. For large files a block-level replication scheme would be more appropriate. – joeforker Jul 06 '09 at 13:15
3

There's a new system called fsnotify that's designed to solve the deficiencies of inotify which was introduced to solve the problems of dnotify. fsnotify lets you watch an entire filesystem without much fuss. Hopefully fsnotify will help solve all our future Linux backup problems.

joeforker
  • 2,399
  • 4
  • 26
  • 35
2

Lsyncd syncs the whole watched tree on startup, because for 99% of cases this is the sensible thing to do. You do want the directory on the target host you have on the local host, otherwise syncs might fail and you want to sync things you missed while turned off. However, if you know what you are doing, you can turn off startup syncing, just set sync{..., startup=false} in the Lsyncd config file.

regarding inotify, its not the number of files but the number of directories that eat up ressources. One directory is one watch, regardless how many files it contains.

fanotify building on fsnotify like inotify looked very promising for people watching tons of directories, but currently as of Linux 2.6.37 fanotify does not report rename (move) events at all, making it unusable for a job like this :-(

axkibe
  • 121
  • 4
2

You can hack something with incron.

 /path1    IN_CLOSE_WRITE     rsync  -au $@/$#  backuphhost:/path
hayalci
  • 3,631
  • 3
  • 27
  • 37
2

Based on research (not testing) it seems like inotify can't handle the very large number of files on most systems and/or is very slow doing so. The thread at http://www.pubbs.net/kernel/200905/109416/ was the most useful, It pointed to a new linux feature that seems to be in or headed towards mainline, fsnotify, that is in linux 2.6.31 and later.

user26055
  • 21
  • 1
1

I have spent 6 months looking for the best solution to perform what you're trying to do; efficiently backup to a NAS. After the initial sync, all else is smooth as butter. The latest version of Lsyncd works quite well. I've documented what I've done in the link below. Just substitute your folder values. Hope this helps:

https://docs.google.com/document/d/1XpqM5h5YMwuQqzdknyDDnjcQVYGjAsyAxfYprqSnhd0/edit

Bobo
  • 11
  • 1
0

there 's a new system used in asia,called sersync which will instead of inotify-tool+rsync resolution http://code.google.com/p/sersync/ it is very easy to use.