2

I forked another repository, and then added a number of files to it. I occasionally merge in changes from the original repository to stay up-to-date.

I realized I have some files in my fork which should be removed, so I am trying to follow [1] to remove some files from my git repository. The source repo has thousands of commits, while I have a few hundred.

When I execute the command, it tries to search all the source commits as well, not just my fork commits, which would take hours instead of minutes.

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch filename.txt' --prune-empty @

There are numerous merges from upstream in between.

master         A---B---C---D---E---F---G (HEAD)
                  /       /
upstream/master  H---I---J---K

[1] https://help.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository

Prod
  • 25
  • 4

2 Answers2

1

I woyuld recommend to use the new tool git filter-repo which replaces BFG and git filter-branch.

Note: if you get the following error message when running the above-mentioned commands:

Error: need a version of `git` whose `diff-tree` command has the `--combined-all-paths` option`

it means you have to update git.


See "Path based filtering":

git filter-repo --path file-to-remove --invert-paths

You can combine it with ref filtering: add first the URL of the original repo (that you have forked) to your local repo:

cd /path/to/local/clone/of/my/fork
git remote add upstream /url/original/repo
git fetch upstream

That way, you can limit the filtering to only your fork commits.
Here is an example assuming you have added commits on top of upstream/master.

git filter-repo --path file-to-remove --invert-paths \
  --refs upstream/master..<myBranch>
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Would upstream/master.. work even if there are multiple branches in between? I want to operate on A..G without touching H..K – Prod Jan 18 '20 at 19:56
  • @Prod It should, but test it on a cloned copy of the repo, to see if it does work as advertised. – VonC Jan 18 '20 at 22:03
1

I haven't looked at the filter-repo command yet, but you can feed filter-branch the exact list of commits you want to examine, everything after a -- arg gets fed to the rev-list filter-branch runs to generate its candidates, e.g.

git filter-branch --index-filter "$myfilter" -- --first-parent HEAD
jthill
  • 55,082
  • 5
  • 77
  • 137
  • I used this in combination with https://stackoverflow.com/questions/15250070/running-filter-branch-over-a-range-of-commits. I used the first commit from my branch to HEAD and it worked. git filter-branch --index-filter $myfilter --prune-empty -- --first-parent sha1..HEAD – Prod Mar 15 '20 at 23:03