I have a large (~2GB), old (15+ years) git repo (that's been converted from CVS to SVN to git over those years). We are changing our hosting from on-prem to cloud, so I want to take this time to clean up the repo. I want to delete old files that are no longer part of the history in order to reduce the overall clone size/time.
There are hundreds of branches that I don't care about anymore. I am really only interested in preserving a few (~10) branches.
I've tried using BFG repo cleaner using --strip-blobs-bigger-than 1M --protect-blobs-from <my refs>
. It seems to match my use case very well. I don't want to remove any files that are currently present in the HEAD of my selected branches, regardless of their size. However, it doesn't deal with the changed commit hashes very nicely, other than producing a mapping file.
I've also tried git filter-repo using --strip-blobs-bigger-than 1M
. This uses replace-refs so that I can reference by the old commit hash, which is really important. However, it breaks things in my current branches by deleting files I don't want to remove.
It seems like git filter-repo
is the tool I should be using, however, I don't want to manually list all of the files I want to delete (or conversely the files I want to keep). Is there a better way to do this?