1

I was trying to remove some files/folders that were accidentally uploaded to my remote git repository via BFG, and after following a guide, I seem to have duplicated commits -- one set of branches that are purged of the data, and one set that still has the data. Here is the network graph demonstrating this: https://github.com/barricklab/pLannotate/network

I first:

git clone --mirror https://github.com/barricklab/pLannotate.git

I then used several commands similar to:

bfg --delete-files *.gbk

and eventually used:

git reflog expire --expire=now --all && git gc --prune=now --aggressive

git push

I realized a had a local commit that wasn't pushed before I cloned, so maybe this had something to do with it? Im not sure. At this point, Im terrified of doing more damage to the repository and Im not sure how to remove the alternate set of branches that still contain the files I was trying to remove.

The very first commits to repository after the initial branching highlight the "good"(files removed) and "bad"(files still present) branches:

https://github.com/barricklab/pLannotate/commit/e146338a62cda43f4d09df90ce90472807f0b60b https://github.com/barricklab/pLannotate/commit/01b5ee7bbb697d3aba30d4d2944ae716dfc53ab9

Can anyone help me get out of this pickle and remove this duplicate set of branches?

mmcguffi
  • 55
  • 6

1 Answers1

3

... after following [the] guide, I seem to have duplicated commits

This is how The BFG works.

This is how anything that does this sort of job with Git works, because no commit can ever be changed. It is literally impossible to "fix" a bad commit. The only thing anyone or anything can do is make a new "duplicate" (but slightly different) commit, which gets a new and different hash ID.

Because commits form chains, and Git works backwards from the last commit to the first, any change you want made requires updating every subsequent commit even if the file-snapshots of the subsequent commits are 100% identical to the originals:

A  <-B  <-C  <-D  <-E  <-F  <-G  <-H   <--main
                    ^
                    |
   let's say this commit is bad: has a big file

To "fix" this big-file problem, even though the big file is removed in commit F, we must copy commit E to a new-and-improved commit E':

A--B--C--D--E--F--G--H
          \
           E'

Once we've done that, we must now copy commit F to a new-and-improved F', with the one change being that F' points back to E', rather than to the original (bad) E:

A--B--C--D--E--F--G--H
          \
           E'-F'

Once we've done that, we're forced to copy G for the same reason, and again with H. The final result is:

           E--F--G--H   [abandoned]
          /
A--B--C--D--E'-F'-G'-H'   <--main

The BFG and other Git fixers will, if/when appropriate, discard the old commits entirely (Git likes to hang on to them as long as possible). But if you introduce this new repository to the old repository again, the old repository will say: Oh, I see you're missing these commits, E-F-G-H and give them right back to you and let you merge them:

           E---F--G---H
          /            \
A--B--C--D--E'-F'-G'-H'-M   <--main

and now you have the old commits and the new commits. The solution to this is to make sure you never touch the new repository, with the altered commits, to any of the old repositories, so that the Git using the old repository can't give you back the old commits you purged when you made the new ones.

In other words, don't rejoin a filtered repository with its pre-filtered version or you'll bring back everything you just worked so hard to get rid of.

Fixing the mess if you've rejoined the old commits

To remove such a merge as M above, assuming you've just added it, you'd generally want to run git reset --hard HEAD^ or git reset --hard HEAD~. (Both of these do the same thing, although some command line interpreters make one or the other easier to type in: CMD.EXE in particular makes you type ^^ instead of ^ so ~ is easier. Note that you can, but don't have to, add 1 after ^ or ~ as well.)

Depending on what you use to view commits, you may well still see both the old and new commits. What you should no longer see, after the reset, is the added merge commit: the old and new commits will be separate "strands".

To update a GitHub or Bitbucket or other hosting site, you must force it to replace the old commits with the new-and-improved commits. There are two options here:

  • Remove or rename the old repository, so that it no longer exists on the hosting site, or exists under some different name. Create a new, empty repository on the hosting site, and use git push from the local repository. You may want to use git push --mirror, which automatically pushes all branches and tags, but note that this also pushes all remote-tracking names, which you might not want to do. You may instead want git push --all --tags.

  • Or, use git push --force, again perhaps with --mirror or --all --tags.

Note that with git push --force, you're losing your backup on the hosting site, so be very sure that you have the right set of commits here. The BFG does an in-place rewrite; some other repository-adjusters, such as git filter-repo, require that you run on a freshly made clone so that you aren't damaging any "regular work" clone, so that you have a backup there.

In all cases, consider making your own personal backups before doing anything. It's almost always easier to restore from a personal backup that you just made just now, than it is to restore from some standard backup that you hope was made last week but it turns out that the backup system died last year and no one got around to fixing it because everything has been just fine, why do you ask?

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thank you for this very detailed and thoughtful answer, @torek. This is much more clear than other explanations that I have seen. With my very last commit, it seems like I accidentally merged the old copy back into the new, which makes sense why both copies are still present. How would I "undo" this new tangle? I tried `git reset --hard HEAD~1`, though that seems to just create a new commit, further complicating the problem. – mmcguffi Jul 22 '21 at 00:10
  • `git reset --hard HEAD~1` is probably the right thing (although without access to your repository, I can't be 100% sure). The tricky bit is that having obtained the old commits from the old repository, you're still in the situation where you can *see* both sets of commits. What you'll see depends on the commit-viewer (and how you invoke it, depending on the viewer). – torek Jul 22 '21 at 01:24
  • I added an update, a "how to fix it" section. – torek Jul 22 '21 at 01:35
  • After using `git reset --hard commit_sha`, I still see the merge for the remote repository in Kraken, though the local copy is not merged. Am I alright to `git push --force` now? Alternatively, I also take hourly backups of my computer, and out of extra caution I did in fact copy my folder before I attempted anything with BFG -- is there an easy way to have the hosting site (GitHub) use that copied folder instead of the mess that is currently hosted there? Would I simply move into that directory and `git push --force`? – mmcguffi Jul 22 '21 at 01:58
  • 1
    Yes, you can just `git push --force master` or `git push --force --all --tags` or whatever from the pre-whatever backup, to set the GitHub one that way. (See any of my long explanations about what `git push` does, including with or without `--force`.) – torek Jul 22 '21 at 02:02
  • This was the solution that worked for me in the end. I did not use `git reset --hard commit_sha`, though that seems like it would have worked as well. +1 to always having backups! Thank you again for the detailed response -- I understand this much better now. I would buy you a beverage @torek if if I could! – mmcguffi Jul 22 '21 at 15:48