I have a repository that has grown too big to the point it has become unusable. Basically my repository is over 2GB and takes too long to clone. I now want to shrink it, but still be able to go back to some specific old versions... Shrinking will involve rewriting history, so i m fine with that. People with clones will have to rebase/cherrypick/copyfiles on top of new branch in new repo clone.
- I have binary files in this repository but I need them there ( think of it as mandatory resource for the software to run ). So I cannot really use filter-branch or BFG to remove some big binary files, since i may need them when reverting to past commits.
- I do not care of previous old/already merged branches ( example : features branches ), but I care about some specific commits ( example heads of past release branches )
- Since I ll be modifying (~many~) very old commits, I have no idea now of how to solve properly merge conflicts ( as can happen with basic rebase/cherrypick ) so I m looking for a solution that doesnt produce any conflicts, or produces only conflicts that can be solved automatically.
- I want to preserve all current branches, so people who have work going on on a clone can rebase/copychanges on them.
- I want to have relevant history between my new commits to match the history from the old repo ( as if the commits were squashed ). The current branches' history will start from one of these old squashed commits.
I think of it as a squash of unneeded old repository history. What I came up so far as a possible process for my case ( I miss some steps and I am still unsure this will do what I think ) is :
- clone a mirror of the existing repo.
- Create orphan branches from the old commits I want to keep. This will create parentless squashed commits with all files needed in them.
- Somehow link them to recreate old repo history => How ? merge / rebase / reset+commit orphans ?
- Cherrypick each current branch's commit list (using intervals), and applying them to the latest commit that squashed the parent of their first divergent commit => How to automatically find which commit to apply a cherry picked commit interval to ? Will that work without conflicts ?
- Move tags to the new tree. Remove previous tree. git garbage collect.
Is this doable / feasible without any conflicts ? Will this work in any kind of cases ( git commit tree can be pretty complex ) ? Any better solution to safely and automatically squash history ?
It seems to me this type of maintenance task is something that will happen for a long running project, so I'm assuming other big projects already used some type of solution. But I guess there could be an option to git init ( or another command ) that I am not aware of, to create a new repo from an old repo for this usecase ?
Update : I found a beginning of solution here : https://wincent.com/wiki/Editing,_amending,_or_squashing_the_root_commit_in_a_Git_repository But I would like to do this multiple times into my history, in a fully automatic way (ie without conflicts)...