4

I'm a sole developer and I have 3 computers. Plus a friend of mine has an account, but never uses it. My current setup is GIT + my own ad hoc backup job that clones the repository, zips it, gnupg it, and send it to a remote ftp. This has worked really well so far for just my source code/docbook/xml files + smaller binary files (mostly icons + some images for use with docbook).

However I recently got involved in a flash game project and made a repository just for it, but flash FLA files are huge (I have seen +70 MB). And git repack with a low window size can no longer complete. Further more I discovered that some of my files uses carriage return instead of linefeed and GIT is not happy diffing these files. When I created the repository I ought to have set it to automatically convert CR to LF on commit. However I was unaware of these problems.

Out of desparation I tried out Mercurial, but it's max filesize is 10 MB.

I'm thinking maybe split up my projects into binary files and keep them in subversion, since it seems ok with binary files. And the source code in GIT. But this will be a huge task and it sounds like a bad plan somehow.

What version control system do you use? How do you back it up? and what do you do with your binary files?

neoneye
  • 143
  • 5
  • 1
    Max file size isn't 10MB. Basically you need at least 2x max filesize as memory size. But VCS are best used with text files, they're optimized for it. With hg you could look at the bfiles extension: http://mercurial.selenic.com/wiki/BfilesExtension – tonfa Dec 21 '09 at 23:20
  • I unfortunately had no luck installing the bigfile extension (I had little time that day). And I rejected mercurial out of the quick conclusion that there probably would be a lot of problems with it since bigfile is not part of the default mercurial install. My machine have 2 GB of ram. Is the bigfile extension well tested? – neoneye Dec 26 '09 at 11:13

3 Answers3

2

That's def. a bad plan, splitting your version control systems!

We commit binary files through SVN every day where we are, some of them are large ones too. Note of course you can't diff a binary file, and if your binary file is 20mb, every time you commit it you'll need another 20mb of space on your subversion server.

As for our backups, we just run an SVNDump nightly, compress and upload similar to what you do.

All that said, this is perhaps a question better tasked for the Stack Overflow guys, as they're heavier VC users than we are at SF!

(Don't stress about re-creating this question over there, if enough people agree it will be moved automatically).

Mark Henderson
  • 68,823
  • 31
  • 180
  • 259
  • SVNDump, do you mean "svnadmin dump" or a the perl module SvnDump? In the past I think I used "svnadmin hotcopy". Good to hear that splitting it sounds bad, I was actually recommended this several times on IRC/twitter iirc. – neoneye Dec 18 '09 at 07:22
  • Subversion uses the same binary difference algorithm for storing changes to ALL files. While you won't be able to diff binary files, subversion can store successive versions of a large binary file fairly compactly if only a small portion of the file changes in each version. – Stephen C. Steel Jun 09 '10 at 16:23
1

You want your 70M files to be backed up, right? If so, whatever scheme you use, you are going to be copying them across to the FTP site at least once. If they are compressible at all, it's likely that git will do near enough the best job of compressing them that is practically possible.

To avoid copying the entire repository to your FTP site for each backup, I would look at

  1. if possible, using "git push" to the backup site or "git pull" from the backup site, if you can install git on that. This will only send the stuff that is missing, and not repeatedly send the 70M files.
  2. if that isn't possible, there is some software that can do a differential send across "dumb" protocols like FTP. I was involved in writing one many, many years ago called 'syncftp' (syncftp.sf.net). My friend Simon (who wrote most of syncftp) went on to write another tool called 'gsync'. There's also 'sitecopy', http://www.manyfish.co.uk/sitecopy/. A caveat: you will want to configure your git so that it doesn't repack files together too much, since the copy process will be file-based, and if a pack file changes even a bit, it will copy the whole thing. So you will lose some packing efficiency in the repository, in order to gain some efficiency in your backup bandwidth usage.

(I'll save your CRLF files for another answer)

jmtd
  • 575
  • 2
  • 6
1

Regarding CRLF - look at git-config(1), specifically the option "core.autocrlf", which can be used to toggle conversion behaviour.

jmtd
  • 575
  • 2
  • 6