3

We are trying to shrink our git repository to under 500MB due to deployment issues.

To achieve that, we have created a new branch where we have moved all old images, videos and fonts to AWS S3.

I can easily get the list of files with git diff --name-only --diff-filter=D master -- public/assets/.

Now, I have tried to run BFG-repo-cleaner 1.14.0 on each file. But I have 400 files and it is taking ages to delete each files separately (still running as I'm writing this).

git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | xargs -i bfg --delete-files '{}'

Since each file is distinct, I can not really use a glob pattern, as suggested at Delete multiple files from multiple branch using bfg repo cleaner.

I tried to separate each file with a comma but that resulted in BFG-repo-cleaner telling me:

BFG aborting: No refs to update - no dirty commits found??

Is there a way to provide multiple files to BFG-repo-cleaner without a glob pattern?

PS. The command I tried with multiple files is: git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | sed -z 's/\n/,/g;s/,$/\n/' | xargs -i bfg --delete-files '{}' && git reflog expire --expire=now --all && git gc --prune=now --aggressive

PPS. The bfg command is on my PATH as a simple bash script with java -jar /tools/BFG-repo-cleaner/bfg-1.14.0.jar "$@"

dotnetCarpenter
  • 10,019
  • 6
  • 32
  • 54

1 Answers1

3

But I have 400 files and it is taking ages to delete each files separately

That is why the tool to use (python-based) is newren/git-filter-repo (see installation)

That way, you can feed that tool a file, with the list of files in it:

git filter-repo --paths-from-file <filename> --invert-paths

From the documentation:

Similarly, you could use --paths-from-file to delete many files.

For example, you could run git filter-repo --analyze to get reports, look in one such as .git/filter-repo/analysis/path-deleted-sizes.txt and copy all the filenames into a file such as /tmp/files-i-dont-want-anymore.txt, and then run:

git filter-repo --invert-paths \
                --paths-from-file /tmp/files-i-dont-want-anymore.txt

to delete them all.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks @VonC! Looks like a much better tool. However, I get an error from a fresh `git clone` (possibly due to switching from main branch) that I don't understand. > Aborting: Refusing to destructively overwrite repo history since this does not look like a fresh clone. (expected at most one entry in the reflog for HEAD) Please operate on a fresh clone instead. If you want to proceed anyway, use --force. – dotnetCarpenter Dec 17 '21 at 01:00
  • 1
    @dotnetCarpenter Make sure your `git status` is clean before launching that command. – VonC Dec 17 '21 at 01:03
  • Our `.git` directory shrank from 454MB to 338MB after using `git filter-repo`, `git reflog expire` and `git gc`. :) Does `git filter-repo` also takes care of other branches and tags? – dotnetCarpenter Dec 17 '21 at 01:04
  • Nice! I see that `git filter-repo` also fixed our branches. And it is super fast! – dotnetCarpenter Dec 17 '21 at 01:09
  • @dotnetCarpenter Well done. It can operate on all branches. – VonC Dec 17 '21 at 01:12