0

Is there any way I can test how the deletion of several branches will affect the size of that git repo. I have a git repo with 10 lived branches and hundreds of feature (short lived branches).

The repo is very large and slow to clone. GitLab shows 1.4 GB as the total size, git-sizer shows a blob total size of 8.4 GB.

I have tried tools such as BFG Repo Cleaner to remove unwanted history, but I've only managed a reduction of 1.4 to 1 GB which is not really sufficient.

My strategy split my repo in 2 and also to use multiple forks to separate streams of work. I will also add push extension and size restriction in GitLab to stop pushing of junk. We've had problems with .rar, .zip, .jar files, personal backups mistakenly pushed etc. However I also want to know if I can retain the history of my long lived branches. So my questions:

  1. Is it possible to see how much the size would be reduced if I deleted all the short lived branches?
  2. Can this be achieved locally?
  3. Or do I just have to duplicate everything to a test repo on GitLab and delete the branches on this test repo?
ptha
  • 836
  • 1
  • 8
  • 22

2 Answers2

1

Answering your three questions, I'd say that yes, it is possible to check this locally. (dependind on what you mean by if I deleted all the short lived branches, I'll assume they are not merged).

You can clone your repo, create local branches for the branches you want to keep, remove the remote, and gc aggresively.

However I don't think that'll help as much as using BGF repo cleaner to remove large binary files. If you need to track said binary files, you should switch to git-LFS for those files.

Your experience with BGF seems strange, I'd expect a much better reduction if you cleaned your repo properly.

A. M.
  • 384
  • 1
  • 6
  • The branches have been merged, they are mostly stale feature branches. I had to run BFG multiple times (sometimes with the same parameters, as it would find more to delete) deleting by filenames, blob IDs etc. I saw it increased the number of commits a lot, but only cleaned about 400 MB. It's possible I could be more aggressive with what I want to remove. But I really want to get to a new clean slate, as that may be the only way to get the required size reduction. As these are remote branches I would then need to checkout all of them locally? – ptha Jul 17 '19 at 12:12
  • If they have been merged, you won't be able to remove them while also preserving history. What is the size of your current *code* in a clean state? (make a local copy of your repo, remove all non versioned files and then remove the .git directory) – A. M. Jul 17 '19 at 13:25
  • The current code in a clean state as you requested is 121 MB. – ptha Jul 17 '19 at 15:36
  • The 121 MB is for 1 of 10 of the long lived branches - there should be common code between them. Some of the short lived feature branches are merged, some are not. The reason I want to test how big the repo is without the short lived branches, is that I want to see if I can keep the long lived branches (with history) in a new forked repo structure (1 fork for each long lived branch), or I just need to transplant the current code (no history) for all of the long lived branches into each separate fork. - does that make sense? – ptha Jul 17 '19 at 15:44
  • yeah, unless in those 121MB there are many binary files that have changed often, the whole repo with cleaned history (BFG) shouldn't weight more than 300-400MB (ballpark estimate). So unless you can bring it down to that size with BFG, I would consider ditching that repo and starting from scratch again. – A. M. Jul 17 '19 at 15:48
1

I just wanted to post the steps I took when following A.M. answer: https://stackoverflow.com/a/57074193/1860867

Because the repo has nearly 600 branches and I wanted to test the effects of deleting nearly 350 of them.

  1. I cloned the repository as normal.
  2. I needed to created local branches for all 600 branches. I found a script to do that:
for i in `git branch -a | grep remote | grep -v HEAD | grep -v master`; do git branch --track ${i#remotes/origin/} $i; done
  1. To verify the correct number of branches, I ran to following to count the number of local branches.
git branch | wc -l
  1. Our branch names have a naming convention (luckily) and I wanted to test deleting all branches except those starting with prefix: main_. I was able to delete all other branches with the following script:
git branch | grep -v "main_" | xargs git branch -D

I ran the command in step 3 again to check if the number of branches and also just git branch, to see if the branch names were all correct.

  1. To remove my remotes, first to get a listing of all remotes, I ran:
git remote -v

In this case the remote for fetch and push was origin, to remove it: I ran:

git remote rm origin

I then ran git remote -v again which returned nothing, to check the removal was successful.

  1. To clean my local repository (and determine the size reduction after deletion of the local branches) I ran the command:
git reflog expire --expire=now --all && git gc --prune=now --aggressive

After running the git gc, the repo size was reduced by ~ 600 MB, git-sizer showed similar reductions.

ptha
  • 836
  • 1
  • 8
  • 22