1

we have a fairly huge svn repository that i access using git. so far everything was working fine. now someone accidentally added some huge chunks of binary data to the svn repository. (hundreds of MB!) of course git also sucked in those chunks (using git-svn).

Is there a way to selectively remove some files from git without disturbing the synchronization with the svn repository?

oliver
  • 9,235
  • 4
  • 34
  • 39

3 Answers3

1

I believe rewriting git history using git filter-branch --tree-filter "rm -rf unwanted_dir" wouldn't disturb the synchronization—as far as I can tell, git-svn only depends on the git-svn-ids in the commit messages, which should stay the same. I haven't tried it, though. ☺

legoscia
  • 39,593
  • 22
  • 116
  • 167
  • sounds a little scary...but still reasonable. i will give it a try with a cloned repo:) thanks for the hint – oliver Oct 07 '09 at 08:26
0

You could try using git svn's 'ignore path' to specify the name of the binaries that were added. You'd probably need to do an 'svn reset' to go back to the point in time in which they were added, and then filter the paths to remove these files.

AlBlue
  • 23,254
  • 14
  • 71
  • 91
  • NB this would have the effect of changing the history from the point of the SVN commit onwards, so other Git users would need to rebase off it after you've done this. – AlBlue Oct 05 '09 at 18:53
  • unfortunately this is not an option since i cannot change the svn history anymore – oliver Oct 07 '09 at 08:24
0

almost forgot about this...sorry.

as it turns out there is no easy solution to the problem that i described. i experimented with several options but each has drawbacks... nevertheless, maybe it is helpful if anybody else has the same problem:

delete the unwanted file/folder from the git history

git filter-branch --tree-filter "[ -f hugefile.bin ] && rm hugefile.bin" -f

Pros:

  • effectively removes the file from your repository

Cons:

  • you will have to clean up your repository (get rid of the old commits as they are still in the git repo). either s.th. along git gc --prune=now or just clone your repository (will by default not clone your remote svn branch)
  • the branch you get will not be synchronized with svn anymore (if you do another git svn fetch git will still fetch the unchanged history

cut of the history of svn when initially cloning

git svn clone -r N http://yoursvnaddress myPartlyClonedRepo.git

where N will be the earliest revision number that is synched

Pros:

  • enables you to keep the size of your repository small (what I wanted in the first place)

Cons:

  • earlier history is "lost"

sparse checkout

this has been a recent addition in git 1.7 and allows you to selectively modify your working directory

git config core.sparsecheckout true
echo "*" > .git/info/sparse-checkout
echo '!path-to-huge-unwanted-dir/' >> .git/info/sparse-checkout
git read-tree -m -u HEAD

Pros

  • easy setup

Cons

  • does not affect the size of your database (.git)
oliver
  • 9,235
  • 4
  • 34
  • 39