1

I have a large (~2GB), old (15+ years) git repo (that's been converted from CVS to SVN to git over those years). We are changing our hosting from on-prem to cloud, so I want to take this time to clean up the repo. I want to delete old files that are no longer part of the history in order to reduce the overall clone size/time.

There are hundreds of branches that I don't care about anymore. I am really only interested in preserving a few (~10) branches.

I've tried using BFG repo cleaner using --strip-blobs-bigger-than 1M --protect-blobs-from <my refs>. It seems to match my use case very well. I don't want to remove any files that are currently present in the HEAD of my selected branches, regardless of their size. However, it doesn't deal with the changed commit hashes very nicely, other than producing a mapping file.

I've also tried git filter-repo using --strip-blobs-bigger-than 1M. This uses replace-refs so that I can reference by the old commit hash, which is really important. However, it breaks things in my current branches by deleting files I don't want to remove.

It seems like git filter-repo is the tool I should be using, however, I don't want to manually list all of the files I want to delete (or conversely the files I want to keep). Is there a better way to do this?

wolfcastle
  • 5,850
  • 3
  • 33
  • 46
  • Maybe you could merge all your remaining branches into one temporary branch, then delete what's not in it. – isherwood Jun 22 '21 at 15:44
  • I'm a little confused, though. Files come and go in a repo with every change of the current branch. Why do you need to "keep" any files? Are they in .gitignore and just sitting around? – isherwood Jun 22 '21 at 15:46
  • @isherwood git filter-repo will remove ALL files larger than . If I have a file that currently exists in HEAD that I need for my build, but it's larger than the threshold, it will get removed. I can't have that. I'm mainly interested in removing large files that _used_ to be in the repo but have since been deleted. – wolfcastle Jun 22 '21 at 16:04
  • I wasn't proposing any particular solution. If you create a new branch, delete all your files, and look at the repo they'll be gone. If you then check out master, they'll be back. I'm not clear on your goals in general. You don't mention anything about file size in your question other than with the commands you show. – isherwood Jun 22 '21 at 16:06
  • Use `git filter-repo` but write a bit of Python code to choose which blobs to strip: those larger than 1M or whatever size you choose, but not any specific ones (by hash ID or, if filter-repo supports this, path name within the particular commits that refer to it). Alternatively, use the BFG, then create our own replace refs (this is quite straightforward). – torek Jun 22 '21 at 19:08
  • 1
    You can use `--refs` in `git-filter-repo` to only rewrite old references ; another way could be to split your history in two, and push the older and newer parts in two separate repos. – LeGEC Jun 23 '21 at 01:38

0 Answers0