[git reset --hard <hash>
and git push -f origin master
succeeded, but]
1) those commits are still there.
Yes. Those commits are permanent. Those three hash IDs represent those three commits, and vice versa. What one does in Git is to take away your own name for those commits (git reset
which changes your master
so that it does not name one of those three hash IDs), and then take away someone else's name as well (git push -f origin master
), so that their master
does not name one of those hash IDs.
(I don't know the hash ID or I would use that here. A hash ID is the "true name" of the commit; names like master
are just symbols that allow us, or Git, to find one specific commit, whose parent(s) find more commit(s), whose parents find yet more commits, and so on. This is the "reachability" idea: a commit that is reachable from a name by this parent-following process—note that any name will do—is "alive" in the repository. A commit that is not reachable, is not quite alive: it's either in the process of eventually dying, or in the process of being created.)
Your own remote-tracking name, origin/master
, retained the old hash ID until you ran git push -f origin master
. When your Git saw that their Git had accepted the name-update, your Git updated your copy of their name. That took away the second name you had for that hash ID. If that was all of your names, as it probably was, that suffices for your repository.
If the Git at origin
had only one name making those three commits reachable, that one name being master
, you have updated that Git / repository so that it no longer has any names for those commits.
The real question at this point is then: what other Git repositories have other names for one of those three hash IDs?
... they are a mistake and I don't want to keep them in the repo. How do I erase them completely from the repo?
In general, you don't: it's normally a futile effort. A commit that is unreachable from any name—branch or tag name, or other Git internal name—will eventually expire and be "garbage collected". This is what makes the commit truly go away. Until then, though, anyone with the hash ID can look up the commit by hash ID.
If no other Git has picked up those three hash IDs, then you have removed the reachability of those three hash IDs from both affected repositories, which is all you really need.
If anyone else does have the three commits, that other Git can offer them (by hash ID and a suggested name) to your Git and/or to the Git that you call origin
. Your Git and origin's Git, being Gits, will very likely say, in Borg fashion, "your technological distinctiveness will be added to my repository" and take them in again. It's very difficult to be rid of a commit forever, because once it is made, Git really doesn't want to be rid of it.
2) when I run
git pull origin master
the repo reverts to the state depicted in image 1. I though I had synchronized both the local and the remote repos! Why is this happening? How do I ensure that commit 3e2fb0... is the 'latest' commit?
If your own git push -f origin master
succeeded, but a later git pull origin master
brought the three commits back, that means that the Git over on origin
decided that its master
should point to the tip-most of those three commits.
What would make that happen? Well, suppose there's a third Git that has those three commits, and probably a name for one of them. The user of that third Git runs git push <url> <name>:master
or even git push <url> <hash-id>:master
. That makes their (third) Git call up the Git at <url>
, which is the Git you call origin
, and offer to it that particular commit by its hash ID.
The Git at origin
takes that hash ID (and re-obtains the commit itself if necessary, if it has managed to garbage-collect the copy it got from you) and looks at its parent IDs, and the parents' parent IDs, and so on, and sees whether it can add that commit's technological distinctiveness to its repository. Sure enough, it can: those three commits easily slot right in atop its own master
! So the Git at origin
takes those commits, adding them to its repository, and updates its master
to point to the tip-most one.
This is what makes it so hard to get rid of these things. A commit, once made, easily fits into every clone of a repository, since it matches up with at least one clone. Every Git that calls up another Git tends to get a copy of its latest commits, which spread like viruses through the technologically-grabby Borg-ness of Git.
To really, truly, get rid of a commit, you have to make it unreachable in your repository and unreachable in every other repository that has it so that none of these Gits offers it back to any other Git at any point. Otherwise, like some stubborn virus, it just keeps coming back.
Things you can do if you have a controlled central repository
There are a few things you can do if everyone agrees that some central repository is the "source of truth". For instance, suppose the Git you call origin
is on your own master server, or even on GitHub or Bitbucket or some other distribution point, and you have sufficient control of that server/Git. You can arrange for that Git to reject the bad commit by its hash ID, or by some other trickery.
The "refuse by hash ID" is the simplest. You can, for instance, create or adjust a pre-receive or update hook on the server at origin
for that particular Git repository. Have the repository check whether the incoming push
operation would make any of the undesired hash IDs become reachable. If so, reject the push. Use a message that the person git push
ing can read and understand: "commit XXXXX is a mistake, please throw it out of your own Git using git reset; contact us at <url> in case of confusion".
Slightly more complicated and error-prone, you can reset your own master
to make the unwanted commits unreachable, then make a new commit atop it. Then git push -f
the result to origin
. Now if some third Git user runs git push origin
to send the original three commits, they won't slot easily in. That user would also have to force-push.
The problem you will run into is that some naive user will think: Oh, I should rebase to make my code slot in more easily! The git rebase
command copies old commits that, e.g., don't slot right in, so that they become new commits with new hash IDs that do slot right in. This naive user will think that the three bad-virus commits are good, copy them to slightly-altered versions that resist your antibodies, and re-inject them into the origin
repository.
This can still happen even with the "reject by hash ID" technique, but at least there you have the strongly worded message that all your users will read and understand. (Cough)
The most complicated method, which is proof against both original hash IDs and rebase-copied versions, is to write a pre-receive or update hook that recognizes the bad commits by something unique that makes them bad: not just the hash ID itself, but something that stands out about the content associated with the commit. That is, one treats the bad commits like a mutating virus, finds an appropriate signature that indicates that a commit contains the to-be-refused virus, and refuses it.
Of course, all of these methods require that you have sufficient control of the Git at origin
.