I use a central git server (a gitlab instance). Developers clone projects to a samba share on another server. At the moment, I make backups of the gitlab server and all of the developers directorys on the other server. This costs a lot of disk space. Ideally, I would like to only backup the gitlab server. I can't do that because changes not yet pushed to it could be lost (and I cannot shift responsibility for backups to the developers, e.g. forcing them to push everything once a day). Is there a solution to deduplicate this data with git? I'm not sure if using another deduplication system before the actual backup would work as I think even small changes in a repo could be big for another system when git packfiles change.
Asked
Active
Viewed 615 times
1
-
Which backup software are you currently using? I believe a number of the heavy-weight ones supports at least file level de-duplication, so the only files that would be duplicated are ones that have current edits in them (but the entire repositories wouldn't take up much additional space). – Derek Pressnall Feb 17 '13 at 01:54
-
I use bacula. What I'm worried about are the garbage collected blobs. When all the blobs are packed together in packfiles and you add/change something, the packfiles will change, right? So from the outside (of git) it will seems as a big file that changed so it's backed up completely. – Chris Keschnat Feb 17 '13 at 16:17
1 Answers
0
Depends on how you back up. If you use git for backing up, then it's easy: add a remote for each of the developers' repos and git fetch
them all. Git then does the deduplication for you by storing each object only once, no matter how many remotes have it.

Dennis Kaarsemaker
- 19,277
- 2
- 44
- 70
-
-
All the refs (branches) are there, so you simply clone the backup and check out your branches. – Dennis Kaarsemaker Feb 16 '13 at 19:50
-
2I just don't consider any backup strategy complete without a restore strategy - and tests. :) – Michael Hampton Feb 16 '13 at 19:51
-
-
This might be a good idea. I will need to write some scripts to detect new projects etc. but I think that might work. Thank you. – Chris Keschnat Feb 17 '13 at 16:24
-
I came up with http://pastebin.com/xVPyJ0Ki and will monitor if that does everything I want. – Chris Keschnat Feb 25 '13 at 13:22