I'm having some trouble deleting some commits from Git

Question

Currently my repo is as pictured:

I need to delete the top 3 commits as they are the product of faulty automated processes. I want to really delete those commits so that I can get rid of their artifacts cluttering the repo. One of the commits made 2000+ incorrect modifications to code, and I want to delete those 3 commits and GC their artifacts.

To that end, I ran these commands:

git reset --hard 3e2fb0bcd1471d33c940a4da3809ce6c48dd1c32
git push -f origin master

And that got me this far:

Which is as I expected. However, I have two problems:

1) those commits are still there. They are not part of the codebase's history; they are a mistake and I don't want to keep them in the repo. How do I erase them completely from the repo?

2) when I run

git pull origin master

the repo reverts to the state depicted in image 1. I though I had synchronized both the local and the remote repos! Why is this happening? How do I ensure that commit 3e2fb0... is the 'latest' commit?

in gitk you can just reload, they will disappear. Later they will be garbage collected. If you want them garbage collected earlier, search for that topic. — tkruse, Dec 05 '17 at 13:24
In step 2, when you call 'git pull origin master', you should not go back to having the commits. More likely you are not telling us some step you did, or somebody (or some automatic system), re-pushed these commits. — tkruse, Dec 05 '17 at 13:26
Deleting commits on remote is also possible using colon notation, easy to find with google. — tkruse, Dec 05 '17 at 13:28
One extra detail: I'm the only committer so I can do whatever write ops I want on the repository. — David C, Dec 05 '17 at 16:56
@tkruse I did reload; I did F5 and restarted the application: they're still there. It is possible that you think "they are gone" because you don't use the --all parameter. — David C, Dec 06 '17 at 09:22
@tkruse "you should not go back to having the commits" I agree with you, I shouldn't - that's what brought me to StackOverflow; "More likely you are not telling us some step you did" that would be pretty dumb on my part... why would I do that? "or somebody (or some automatic system), re-pushed these commits" No, I'm the only committer and the situation I describe holds true even when I paste all 3 commands to the shell in quick succession ; — David C, Dec 06 '17 at 09:28
´gitk --all´ would also not show the commits, but there are arguments to gitk which would show them still, because they are still there. If you can easily reproduce several times, paste the complete shell history of one run here with all messages from git, maybe there is a warning that is relevant. — tkruse, Dec 06 '17 at 10:05

score 0 · Answer 1 · answered Dec 05 '17 at 16:38

[git reset --hard <hash> and git push -f origin master succeeded, but]

1) those commits are still there.

Yes. Those commits are permanent. Those three hash IDs represent those three commits, and vice versa. What one does in Git is to take away your own name for those commits (git reset which changes your master so that it does not name one of those three hash IDs), and then take away someone else's name as well (git push -f origin master), so that their master does not name one of those hash IDs.

(I don't know the hash ID or I would use that here. A hash ID is the "true name" of the commit; names like master are just symbols that allow us, or Git, to find one specific commit, whose parent(s) find more commit(s), whose parents find yet more commits, and so on. This is the "reachability" idea: a commit that is reachable from a name by this parent-following process—note that any name will do—is "alive" in the repository. A commit that is not reachable, is not quite alive: it's either in the process of eventually dying, or in the process of being created.)

Your own remote-tracking name, origin/master, retained the old hash ID until you ran git push -f origin master. When your Git saw that their Git had accepted the name-update, your Git updated your copy of their name. That took away the second name you had for that hash ID. If that was all of your names, as it probably was, that suffices for your repository.

If the Git at origin had only one name making those three commits reachable, that one name being master, you have updated that Git / repository so that it no longer has any names for those commits.

The real question at this point is then: what other Git repositories have other names for one of those three hash IDs?

... they are a mistake and I don't want to keep them in the repo. How do I erase them completely from the repo?

In general, you don't: it's normally a futile effort. A commit that is unreachable from any name—branch or tag name, or other Git internal name—will eventually expire and be "garbage collected". This is what makes the commit truly go away. Until then, though, anyone with the hash ID can look up the commit by hash ID.

If no other Git has picked up those three hash IDs, then you have removed the reachability of those three hash IDs from both affected repositories, which is all you really need.

If anyone else does have the three commits, that other Git can offer them (by hash ID and a suggested name) to your Git and/or to the Git that you call origin. Your Git and origin's Git, being Gits, will very likely say, in Borg fashion, "your technological distinctiveness will be added to my repository" and take them in again. It's very difficult to be rid of a commit forever, because once it is made, Git really doesn't want to be rid of it.

2) when I run
git pull origin master
the repo reverts to the state depicted in image 1. I though I had synchronized both the local and the remote repos! Why is this happening? How do I ensure that commit 3e2fb0... is the 'latest' commit?

If your own git push -f origin master succeeded, but a later git pull origin master brought the three commits back, that means that the Git over on origin decided that its master should point to the tip-most of those three commits.

What would make that happen? Well, suppose there's a third Git that has those three commits, and probably a name for one of them. The user of that third Git runs git push <url> <name>:master or even git push <url> <hash-id>:master. That makes their (third) Git call up the Git at <url>, which is the Git you call origin, and offer to it that particular commit by its hash ID.

The Git at origin takes that hash ID (and re-obtains the commit itself if necessary, if it has managed to garbage-collect the copy it got from you) and looks at its parent IDs, and the parents' parent IDs, and so on, and sees whether it can add that commit's technological distinctiveness to its repository. Sure enough, it can: those three commits easily slot right in atop its own master! So the Git at origin takes those commits, adding them to its repository, and updates its master to point to the tip-most one.

This is what makes it so hard to get rid of these things. A commit, once made, easily fits into every clone of a repository, since it matches up with at least one clone. Every Git that calls up another Git tends to get a copy of its latest commits, which spread like viruses through the technologically-grabby Borg-ness of Git.

To really, truly, get rid of a commit, you have to make it unreachable in your repository and unreachable in every other repository that has it so that none of these Gits offers it back to any other Git at any point. Otherwise, like some stubborn virus, it just keeps coming back.

Things you can do if you have a controlled central repository

There are a few things you can do if everyone agrees that some central repository is the "source of truth". For instance, suppose the Git you call origin is on your own master server, or even on GitHub or Bitbucket or some other distribution point, and you have sufficient control of that server/Git. You can arrange for that Git to reject the bad commit by its hash ID, or by some other trickery.

The "refuse by hash ID" is the simplest. You can, for instance, create or adjust a pre-receive or update hook on the server at origin for that particular Git repository. Have the repository check whether the incoming push operation would make any of the undesired hash IDs become reachable. If so, reject the push. Use a message that the person git pushing can read and understand: "commit XXXXX is a mistake, please throw it out of your own Git using git reset; contact us at <url> in case of confusion".

Slightly more complicated and error-prone, you can reset your own master to make the unwanted commits unreachable, then make a new commit atop it. Then git push -f the result to origin. Now if some third Git user runs git push origin to send the original three commits, they won't slot easily in. That user would also have to force-push.

The problem you will run into is that some naive user will think: Oh, I should rebase to make my code slot in more easily! The git rebase command copies old commits that, e.g., don't slot right in, so that they become new commits with new hash IDs that do slot right in. This naive user will think that the three bad-virus commits are good, copy them to slightly-altered versions that resist your antibodies, and re-inject them into the origin repository.

This can still happen even with the "reject by hash ID" technique, but at least there you have the strongly worded message that all your users will read and understand. (Cough)

The most complicated method, which is proof against both original hash IDs and rebase-copied versions, is to write a pre-receive or update hook that recognizes the bad commits by something unique that makes them bad: not just the hash ID itself, but something that stands out about the content associated with the commit. That is, one treats the bad commits like a mutating virus, finds an appropriate signature that indicates that a commit contains the to-be-refused virus, and refuses it.

Of course, all of these methods require that you have sufficient control of the Git at origin.

"what other Git repositories have other names for one of those three hash IDs?" I probably should have mentioned in my question that I'm the only committer and that the situation I describe holds true even when I paste all 3 commands to the shell in quick succession; — David C, Dec 06 '17 at 09:44
In that case, I'd log in to the machine that your Git calls `origin`, and observe what's happening in *its* repository when you push. — torek, Dec 06 '17 at 15:51

I'm having some trouble deleting some commits from Git

1 Answers1

Things you can do if you have a controlled central repository