1

I'm starting a new thread (to continue the thread Backup of machines running linux and windows on Mike Renfro suggestion, which I'm grateful to as well as voretaq7).

I think I'm intended to use SVN as a backup solution AND a sync solution (the two of them) for my home directory (let's say photos, proramming source files, documents, etc). If I put all together I think it would turn something like 300GB, maybe 500GB for future developement (maybe a different repository / subfolder for each of them).

Number of files will be probably HUGE (100'000 or something like that). I also considered using rsync but it seems to me like it's more to sync than to backup (am I wrong ?).

I have to sync machines running windows AND linux, from netbook to huge desktop, ... (the repository will be placed on my home server and syncing to it will make possible to keep a copy of the files on each machine I own).

Here are my doubts / questions:

  • Will SVN be able to hand all of this (well ... first checkout will probably take something like a month, but anyway ...) ? I mean average sized files (up to 200MB, not planning to do more, I do NOT intend to backup movies or whatever-so-big using SVN). The number of files in the repository may be too much for it ? Alternatives ?
  • If I move / rename a file on the local repository, then do a commit, whan happens ? File gets duplicated (both old an new name), links are created ? What happens if I then checkout on another machine ? Will I get two copies of the same file ? Should I use "svn move" on the local repository of will the "move" command of the OS work anyway ?
  • Same question with delete / a deleted file ?

I'm also open to alternatives, possibly providing versioning, sync and backup features.

user76949
  • 141
  • 7

2 Answers2

2

Reading your original post and now this one, well, it sounds like you're really complicating things instead of using proven backup techniques as suggested by others. Specifically:

  • Use the tools as they were meant to be be used: SVN for version control for your source files, rsync/amanda/bacula/whatever for your backups with a sensible backup window (5 minutes is not going to happen) like 12 hours or even better 24 hours.

  • Backup to somewhere portable, like a USB drive you can grab in an emergency or better yet, off-site to a DropBox account for all your most important files (which is usually much less on disk than big binaries like movies, software ISOs, music, etc. that can usually be replaced/downloaded again at a later date).

  • Use network drives (redirecting My Documents, setting up mapped drives from the Windows and Samba mounts from the Linux machines) and store your data on your new Linux file server with at least RAID1 configured. Put your money into good Intel GbE NICs and a decent GbE switch and you won't notice any ill effects of working over a network vs. locally (and with RAID1 disk mirroring, you're achieving the real-time sync you're looking for).

This is pretty much "File Server/backup 101" here, but what works for countless SOHO/small businesses should work for you.

gravyface
  • 13,957
  • 19
  • 68
  • 100
  • So basically your suggestion is to "split" my collection of personal pictures, documents and source files (as well as pdfs, etc) in three main categories and then based on that rsync / backuppc or bacula / svn ? 1) "Archive" and therefore backups (full + incrementals) 2) "Recent files" and therefore SVN 3) The two of them and therefore rsync ? – user76949 Apr 24 '11 at 18:53
  • If that's your idea I would say it's not half bad: only problem might be: to backup to an offsite location (like dropbox, but I'd prefer something which would allow at least rsync or other standard protocols like FTP, SVN, ...) I have a very SLOW internet connection (120KB/s [KiloBYTES] at peak in upload) so syncing the full and incremental tar backups (or whatever) is almost out of question. – user76949 Apr 24 '11 at 18:59
  • I'm saying back everything up nightly to USB/tape/cloud or wherever, using whichever tool you'd like, but move your day-to-day file access to a fileshare that's redundant, that way, all your data is a) central on server with disk redundancy; b) backups are easier/faster because all the data is located on one place. – gravyface Apr 24 '11 at 20:05
  • Thanks. Maybe I did not undestood some parts of your statement: are you saying to keep all my data (archive + share + repositories) on the central server (then back up these data using USB or some network offsite backup), which implies: a) data is not versioned by default (which might be not necessary for most of it and b ) I create a repository (or a subfolder on a current repository) each time I have to do something important for which versioning might matter ? Sorry if I misunderstood something ... unfortunately my english is not as good as it should be :( – user76949 Apr 24 '11 at 20:41
  • Mhhh .... OK ... do you have any suggestions for offsite backup ? There are not so many services which you can define cheap out there. I found crashplan (http://www.crashplan.com) which also provides versioning, do you have any other suggestion ? Thanks ! – user76949 Apr 25 '11 at 05:12
  • Ubuntu One provides 2GB for free of online storage. – gravyface Apr 25 '11 at 12:50
  • Only for Ubuntu desktop / server as I recall ... currently I use debian squeeze and I'm quite happy with it (EACH and EVERY upgrade from one release to another of ubuntu screwed things up [in my case], not willing to repeat the same experience with a server). Not really worried about being free (in the meaning I don't have to pay for it), but it should allow at least 20 GB (because personal photos should be backed up as well ... that's a big portion of my backup space to be used). Any otrher suggestions ? Thank you very much ! – user76949 Apr 25 '11 at 20:49
  • I'd ask in a separate question for recommendations for a cloud backup service. – gravyface Apr 25 '11 at 23:57
0
  1. Well this is the hardest one..My first instinct is not to go with svn, simply because of the time it takes. It maintains it's own directory structure '.svn' and therefore i have found it painful with many small files. None in the range mentioned only in hundreds or a few thousands max.. But still i don't think it will be better at higher ranges...

Don't have experience with real-time rsync though, so can't comment. but from what i have read and heard, that's a better idea.

But, what is the exact problem you are trying to solve? You seem to be trying to use one best tool for a couple of issues. Your post hints you would prefer versioning but mainly interested in sync and backup. Am guessing here(mind modelling? :-P), but i think you need a combination of both for different purposes.

  1. You should use svn move. the OS move command won't be reflected in the repository info and svn will not record/track it. http://svnbook.red-bean.com/en/1.1/re18.html
  2. http://svnbook.red-bean.com/en/1.1/re08.html
  • Thank you for your quick answer. What I wanted to achieve was maily backup and versioning (the sync would simply be a consequence of that). Problem for me with rsync would be versioning (as well as - does it check before syncing if both files have been edited since last sync ?). Do you have other suggestions (repository systems, backup systems, ...) ? In the discussion linked I came to the conclusion that SVN was the best solution for me. – user76949 Apr 24 '11 at 18:08
  • I have a server (amd quad core) which will contain the central repository with all revisions, all clients will sync from and to it when work has to be done or to be saved. That way redudancy will also be achieved. That was the basic idea beyon all that. – user76949 Apr 24 '11 at 18:10
  • If you want redundancy, i would recommend bzr/mercurial/any other distributed versioning system. That way you have multiple copies,only they might all be in different versions.. – Software Mechanic Apr 24 '11 at 18:32
  • What do you mean ? In SVN you also have copies in different versions (isn't it ?). Versioning should be provided by SVN itself. By redudancy I meant "if the SVN repository on the server should fail". But I get your point ... should be better svnsync the server if I wanted "true" redudancy, i.e. keep all the old versions as well. On the server the SVN (or whichever other versioning system) repository should sit on a raid-6 array of 4 to 6 drives (well .. among other things :D). – user76949 Apr 24 '11 at 18:40
  • Sorry, just now read the original question. I agree with gravyface. All you seem to need is an svn and rsync of the svn repository. You seem to be speculating too much on the future possible problems instead. – Software Mechanic Apr 24 '11 at 18:43
  • That may be the case. My main concern with SVN (because of the move command) and rsync (becaus of the dangerous "--delete" command) is when files gets deleted or renamed. Would like to think about it now and not after a few months when I'll have a full backup repository/whatsoever to convert and/or move (or duplicated files to take care of). – user76949 Apr 24 '11 at 19:02
  • Yeah, but rsyncing the svn repository and svn repos for versioning should work...also a perioding svn dump on the repo would be a good additional idea(incase rsync failed for some .svn files).... – Software Mechanic Apr 24 '11 at 19:15
  • OK. Only thing I do not understand is what you mean by "rsyncing the svn repository". My idea was doing a checkout from the repository to the clients (so that if some client modify a file, it can svn-it back to a new version). Would not using rsync default that purpose ? I agree with @gravyface that what he proposed might be better in some ways (MUCH LESS disk space used on clients computers) and tipically I'm not in an urge to seek for my files if I'm offline (because if I'm on the train I would download the files I need anyway). But using a file server like he said requires offline backup – user76949 Apr 24 '11 at 19:23
  • and / or usb external drives to do a REAL backup (because RAID is not a backup tool !) which might be painful depending on how backups are made (see my comments on grayface's suggestion about my internet connnection being slow: rsyncing small files is surely possible, but copyng tar dumps etc. is not going to happen). – user76949 Apr 24 '11 at 19:26
  • But if I use the central file server that would default the purpose of SVN the backups, but then I do not have versioning ... mhhh. Seems like a dead end. The ideal would be at this point using a file sharing as greyface suggested to a SVN repository that when a files gets modified it is copied on another version (sort of pseudo versioning filesystem). Seems like versioning is not going to happen (or maybe should I just create a SVN repo with documents i'm editing [like 100 files at most], while keeping all other files on the file server) ? A sort of SVN-on-the-fly / ad hoc system ?? – user76949 Apr 24 '11 at 19:30