2

I want to use git as a means of backing up and having a history of changes for configuration files of software that I'm using on my desktop. This would be a local repository with a local installation of git.

I am however unsure of these:

  • Is it possible to add a file to repository, as if it was in a different path than its real one?

    My concern is that if I add files naively, by copying them to the repository and updating the repo, I'd end up using double the space for each file. While it isn't going to be a lot of space, I would still prefer a cleaner solution. I've looked into git-add, git-mv and git-filter-branch but they don't seem to provide a clean solution to this. Both add and mv mechanics can be used to accomplish this task, but don't really get around the problem of duplicating files. filter-branch seems way too big of a hammer for this task.

  • Is it possible to change the path of a file after it is added?

    git-filter-branch seems capable of doing this, but I'm unsure of the side-effects.

  • Would symlinks or hardlinks work to circumvent the need to copy the files, assuming git isn't capable of what I need explicitly? Would this work cross-platform, or does git handle links differently on different platforms?

edit in 2017 - accepted the answer I got because now that I understand git better, I realize there isn't a better solution than copying files or just having a git repo at a grandparent directory. The inelegance of that, means copying files is the optimal solution and symlinks/hardlinks are irrelevant to this problem. The comments and answer may be useful to someone looking for something similar but not the same, so I encourage checking those out.

mechalynx
  • 1,306
  • 1
  • 9
  • 24
  • Which operating systems are you trying to target? And which configuration files? Your 'dotfiles' (`~/.bash_profile` and the like)? – nwinkler Aug 11 '15 at 12:49
  • @nwinkler I'm hoping for a solution that would work transparently on windows and linux (or at least one that can be scripted to behave transparently). Config files would include dotfiles (such as `.vimrc` and `vimfiles`) as well as various other config files (e.g. for GIMP, Firefox etc.). – mechalynx Aug 11 '15 at 12:52
  • Take a look at my answer over here: http://stackoverflow.com/a/31848307/1228454 - it's not a 100% duplicate, but might give you some insights. There are tools for this (like Homesick, which I mentioned over there), but not everything might work on Windows (unless you use Git-Bash or Cygwin). – nwinkler Aug 11 '15 at 12:55
  • @nwinkler Very interesting. Unfortunately, as you say in the comments, it's a "buy vs build" solution and I intend this to be part of a broader, cross-platform, python-based automation framework I intend to build, so it's a bit too specific and too clunky for my use. The OP's solution is also clever, but I looked `git config` up and it seems that user got lucky with `core.worktree` while I need a more generalized approach. Seems like I'll have to go with either copied files or some much more elaborate operation. Thanks for the info though! :) – mechalynx Aug 11 '15 at 13:37
  • Fair enough. Take a look at http://dotfiles.github.io/ as well, there are many more solutions around the management of dotfiles there. Maybe there's one that will work for you... – nwinkler Aug 11 '15 at 13:40
  • @nwinkler I did look at it briefly, but it's specific to dotfiles which tend to reside in one directory or are clumped together - in my case, these files could be strewn all around the place, especially with the nightmarish organization of the file system on windows. I'll leave the question open just in case, but I think I'll need to just copy the files and commit them that way - not ideal, but it's at least simple and guaranteed to work. – mechalynx Aug 11 '15 at 13:48
  • etckeeper is a package to keep history of configuration files under Linux /etc. Its README mentions some pitfalls when maintaining configuration files under git. And of course its implementation is open source, so you can modify it for your needs. – Uwe Geuder Aug 12 '15 at 06:35
  • @UweGeuder Interesting, but it still copies the files before committing, hence the pitfalls mentioned - copying was what I wanted to avoid in the first place so, while relevant in some ways, it doesn't solve the problem. Thanks for the input though :) – mechalynx Aug 12 '15 at 12:13
  • @ivy_lynx I'm not sure what you mean exactly by copying the files. When using etckeeper there is only one copy of the file: /etc/foo is the copy used by Linux as a configuration file and at the same time it is in the working area of git. (Of course git has the same data once more in its object store, but a.) that's compressed and/or stored as delta and b.) you don't really have version control if you optimize away that data) – Uwe Geuder Aug 13 '15 at 17:02
  • @UweGeuder I misunderstood etckeeper's readme. However, if it uses /etc or /etc/foo as its working directory, it has the same issue as Homesick does, namely that while in those cases simply changing the working directory is sufficient, in my case there is no single working directory that would work. I need to get files from all over the hard drive and checking the entire partition into version control would be much worse than copying a few files. I could use multiple repos, but having entire repos for single files would be just as much of a waste. – mechalynx Aug 13 '15 at 17:24

1 Answers1

5

From the discussion in the comments I conclude that the question really is:

In a big directory hierarchy (my complete home directory) I want to put selected files (configuration files) under git version control while most other files ("normal" files) are not in version control.

I have never done that, but 2 options come to my mind:

1.) Make the whole directory hierarchy a git working area, but by using .gitignore rules exclude all files unless they are explicitly included. So you need to list configuration directories or files to have them version controlled.

Here is a demo script that builds a small example directory hierarchy. Everything is excluded from git version control, except the configuration directories/files explicitly included (You can guess from the names, which ones they are.)

#! /bin/sh
time=$(date +%H-%M-%S)
mkdir example-$time
cd example-$time

touch a b c
touch conf1

mkdir d1
touch d1/a
touch d1/b
touch d1/conf

mkdir d2
touch d2/f1
touch d2/f2

mkdir conf-d
touch conf-d/conf1
touch conf-d/conf2

git init

cat >.gitignore <<EOF
# ignore all files
*

# but not directories (if a directory is once ignored, git will never
# look what is inside)
!*/

# of course .gitignore must never be ignored
!.gitignore

# list configuration files
!conf1
EOF

cat >d1/.gitignore <<EOF
# list the configuration files in this "mixed" directory
!conf
EOF

cat >conf-d/.gitignore <<EOF
# this is a configuration directory, include everthing...
!*
# ... but ignore editor backup files
*~
EOF

git add -A
git status

I have never done this in real life, but from the example it seems to work. However, when you have other git repos in your directory tree, you probably need to include them as submodules. There are all kind of pitfalls related to submodules, so it might get quite tricky in the end.

2.) You did not mention what filesystem you are on. If you are on a Linux filesystem you could use hardlinks. Create a git repo for your backup somewhere and add hardlinks for all configuration files.

This demo script shows the idea:

#! /bin/sh
time=$(date +%H-%M-%S)
mkdir hl-example-$time
cd hl-example-$time

touch a b c
touch conf1

mkdir d1
touch d1/a
touch d1/b
touch d1/conf

mkdir d2
touch d2/f1
touch d2/f2

mkdir conf-d
touch conf-d/conf1
touch conf-d/conf2

mkdir hl-backup
cd hl-backup
git init

ln ../conf1 .

mkdir d1
ln ../d1/conf d1

mkdir conf-d
ln ../conf-d/* conf-d

git add -A
git status

Again, I have not done this in real life and hardlinks always have their pitfalls. A program might unlink its existing configuration file and create a new one with the same name instead of updating the existing one (actually that might be a good implementation because it helps to avoid corrupted configuration files). So you would need a script that checks that all configuration files are still hard-linked. And because you cannot hard-link directories, you would need a script that searches for new configuration files in configuration directories. I don't know how git exactly behaves on checkout. So before having git modify (e.g. restore) a hard-linked backup file, make sure you have another level of backup. Don't try this on a production system unless you really know what you are doing.

Uwe Geuder
  • 2,236
  • 1
  • 15
  • 21
  • That's for the effort, I up-voted because this was a relevant and well-written answer, but the question isn't quite the one you figured out - it's closer to: can I get git to efficiently and cleanly version control arbitrary files as a set? Your solution would work and is pretty much what Homesick and etckeeper do (closer to Homesick I think). – mechalynx Aug 15 '15 at 10:52
  • However, I can avoid this complexity by writing a script to copy files to the repo, then commit. My concern was entirely about avoiding the copy, while _still_ using git solely for the process. It doesn't seem like it's avoidable though and I prefer copies. I'll just need to handle large files on a per-case basis (if that _ever_ is an issue, I'm just being careful, trying to have some way to build my backup system with having to refactor it heavily in the future). – mechalynx Aug 15 '15 at 10:53
  • Accepted this as an answer to close the question since years later, my understanding of git tells me what the original question wanted is not possible at all and this is the closest you can get. – mechalynx Aug 14 '17 at 14:06