34

I accidentally committed some large binary data into some commits. Since then I've updated my .gitignore, and those files are no longer being committed. But I'd like to go back into the older commits and selectively prune out this data from the repository, removing a couple directories that should have been in .gitignore. I don't want to remove the commits themselves.

How would I go about accomplishing this? My preferred method would be some way to retroactively apply the .gitignore rules to old commits... an answer that uses this method would also be pretty generally useful to others, since I'm sure my problem is not unique. It would also be quick to apply to a general solution, without lots of customization specific to each user's unique directory structure.

Is this possible, either the easy way I suggest above, or in some more complicated manner?

TonySalimi
  • 8,257
  • 4
  • 33
  • 62
Myrddin Emrys
  • 42,126
  • 11
  • 38
  • 51
  • 5
    See [git: forever remove files or folders from history](http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/) and [Remove large binary files from repository](https://groups.google.com/forum/#!topic/github/ghXxynyhj0o). They should help. – moinudin Dec 30 '10 at 18:31
  • 1
    Googling for 'git remove file from history' would've solved your problem. By the way, rebasing a change to .gitignore into the early history and then somehow retroactively applying the .gitignore to all commits is likely not going to help much, because sometimes ignored files do get intentionally checked in, and you wouldn't want to lose those. – Jo Liss Dec 31 '10 at 17:38

2 Answers2

12

The solution in this answer worked perfectly for me:

You can also test your clean process with a tool like bfg repo cleaner, as in this answer:

java -jar bfg.jar --delete-files *.{jpg,png,mp4,m4v,ogv,webm} ${bare-repo-dir};

(Except BFG makes sure it doesn't delete anything in your latest commit, so you need to remove those files in the current index and make a "clean" commit. All other previous commits will be cleaned by BFG)

Community
  • 1
  • 1
Ed.
  • 233
  • 2
  • 7
  • 3
    I had a folder (accidentally) with all the binary files and with this: **java -jar bfg.jar --delete-folder downloads** my repo came from 250Mb to 20Mb! – António Almeida Dec 03 '13 at 00:27
  • @akauppi - His answer is about git. He is referencing a Java tool (bfg) that removes files from a git repository based on, it appears, the filename extension. – mwakerman Sep 02 '15 at 03:47
  • @mwakerman I found this looking to remove files other than java, so a general .git solution would be preferable for me here. – ryanjdillon Sep 22 '15 at 10:27
0

A (relatively) new tool was released that replaces the git filter-branch function that used to be the best answer to this question. git filter-repo is a Python tool that can handle nearly any history revisioning you need to do in git.

For this example (removing specific folders or files from a repo) I could run the command like this:

git filter-repo --path bin --path-glob '*.tar.gz' --invert-paths

This will filter out any content that's in the given folder or matches the given glob pattern. Like any tool that revises git history you should either try and catch this early before your commits are shared with others or be very familiar with git-rebase and recovering from difficult changes.

Myrddin Emrys
  • 42,126
  • 11
  • 38
  • 51
  • I needed this again, years later (all mistakes eventually become repeat mistakes) only to discover that the best answer had been rightly removed for being a link-only answer. So I did some new research, and answered by own question. I hope this helps someone else. – Myrddin Emrys May 05 '22 at 18:38