1

We develop using a main branch that has the latest developments and release branches that split off of this main branch every so often and constitute a release. Bugs are fixed on these release branches and these bugfixes are merged back to the main branch. All our changes go through PRs, you cannot manually (force) push on any of these important branches.

Now, a human error has led to the main branch being merged into the release branch (through a PR). This was reverted through a PR containing a revert commit of the erroneous merge commit. The release branch is thus "fine" (except for these two extra commits). Subsequently, this release branch was merged into the main branch. What happened next was unexpected: the erroneous merge from main to release was ignored somehow (which is logical) but the follow-up revert commit undoing the mistake was merged in all its glory, effectively removing all changes on the main branch since the release branch was split off.

I unfortunately don't have the details of how exactly this came to be, but this could be explained as "expected" behaviour somehow. I plan on writing a small script of git commands that repeat this kind of sequence as soon as I can and will update the question here.

My question is: is there a way (without force pushing and eradicating the mistake commits) to be able to merge the release branch into the main branch without the revert commit having an effect on the main branch's files? Right now it seems this will always result in the revert commit altering stuff that shouldn't be altered.

rubenvb
  • 74,642
  • 33
  • 187
  • 332
  • [Here's an answer](https://stackoverflow.com/a/68504135/184546) that provides some related context. It's not a dup to your question since it's about trying to re-merge the same branch, rather than bringing in the revert commit to another branch as in your case, but I believe the explanation and options in that answer may be useful to you. (In your case you almost certainly want #1 - revert the revert.) – TTT Aug 24 '21 at 19:26

1 Answers1

6

Yes, this is normal. TL;DR: you probably wanted to revert the revert. But you were asking more about the mechanism, not a quick fix, so:

Long

The way to understand Git's merge is to understand:

  1. that Git uses (stores) snapshots;
  2. that commits are the history: they link back to older commits;
  3. what it means for a commit to be "on a branch" in the first place, and that commits are often on multiple branches;
  4. that git merge locates the merge base, i.e., the best shared commit that is on both branches; and
  5. how merge works, using the merge base and two tip commits.

The snapshot part is pretty straightforward: every commit holds a full copy of every file, as of the state it had at the time you (or whoever) made that commit.1 There's one quirk, which is that Git makes commits from whatever is in its index AKA staging area, rather than what's in some working tree, but that mostly really explains why you have to run git add so much.

Points 2 and 3 tie to each other: commits are the history because each commit stores the raw hash ID of some earlier commit(s). These backwards-pointing links let Git move backwards through time: from commit to parent, then from parent to grandparent, and so on. A branch name like main or master simply identifies whichever commit we want to claim is the last commit "on" the branch.

This means that you need to understand points 2 and 3 at the same time. Initially, that's not too hard, because we can draw commits like this:

... <-F <-G <-H

Here H stands in for the hash ID of the last (latest) commit. We can see that H "points back" to earlier commit G (commit H literally contains the raw hash ID of commit G). Hence G is H's parent. Meanwhile commit G contains the raw hash ID of still-earlier commit F: F is G's parent, which makes it H's grandparent.

To this drawing, we just add a branch name at the end, e.g., main points to H:

...--F--G--H   <-- main

When we add a new commit to a branch, Git:

  • makes the new commit using the snapshot in the index / staging-area;
  • wraps that with metadata saying who made the commit, that they made it now, that the parent is commit H (the current commit), and so on;
  • writes all this out to get a new random-looking hash ID that we'll call I; and—this is the tricky bit—then
  • writes I's hash ID into the name main.

The last step updates the branch, so that we have:

...--F--G--H--I   <-- main

The name main now selects I, not H; we use I to find H, which we use to find G, which we use to find F, and so on.

Git knows to update the name main because (or rather, if) that's the branch we are "on" when we make new commit I. If we have more than one branch name, they might all point to the same commit:

...--G--H   <-- develop, main, topic

Here all three branch names select commit H. That means it doesn't matter which one we git checkout or git switch to, in terms of what we get checked out: we get commit H checked out in any case. But if we pick develop as the name we use here, that tells Git that develop is the current name, too:

...--G--H   <-- develop (HEAD), main, topic

Note that all the commits up through and including commit H are on all three branches.

Now, when we make new commit I, the name that Git updates will be develop: that's the name that the special name HEAD is attached to. So once we make I we have:

          I   <-- develop (HEAD)
         /
...--G--H   <-- main, topic

If we make one more commit, we get:

          I--J   <-- develop (HEAD)
         /
...--G--H   <-- main, topic

Commits up through H are still on all three branches. Commits I and J are—at least currently—only on develop.

If we now git switch topic or git checkout topic, we move back to commit H while attaching the special name to the newly chosen branch name:

          I--J   <-- develop
         /
...--G--H   <-- main, topic (HEAD)

If we now make two more commits, it's the name topic that moves this time:

          I--J   <-- develop
         /
...--G--H   <-- main
         \
          K--L   <-- topic (HEAD)

From here, things get a little complicated and messy, but we're ready to look into the concept of a merge base now.


1These full copies are de-duplicated, so that if 3 commits in a row re-use hundreds files each time, with just one file changing over and over again in the new commits, there's just one copy of each the hundreds of files, shared across all 3 commits; it's the one changed file that has three copies, one in each of the three commits. The re-use works across all time: a new commit made today, that sets all your files back to the way they were last year, re-uses the files from last year. (Git also does delta compression, later and invisibly and in a different way than most VCSes, but the instant re-use of old files means that this is less important than it might seem.)


Merge comes in many flavors: let's look at the fast-forward merge now

Running git merge always affects the current branch, so the first step is usually to pick out the right branch. (We only get to skip this step if we're already on the right branch.) Let's say we want to check out main and merge develop, so we run git checkout main or git switch main:

          I--J   <-- develop
         /
...--G--H   <-- main (HEAD)
         \
          K--L   <-- topic

Next, we'll run git merge develop. Git is going to locate the merge base: the best commit that's on both branches. The commits that are on main are all the commits up through and including—ending at—commit H. Those that are on develop are all commits up through J, along the middle and top lines. Git actually finds these by working backwards, not forwards, but the important thing is that it finds that commits up through H are shared.

Commit H is the best shared commit because it is, in a sense, the latest.2 This is also pretty obvious just by eyeballing the graph. But: note that commit H, the merge base, is the same commit as the commit we're sitting on right now. We're on main, which selects commit H. In git merge, this is a special case, which Git calls a fast-forward merge.3

In a fast-forward merge, there's no actual merge required. Git will, in this case, skip the merge, unless you tell it not to. Instead, Git will just check out the commit selected by the other branch name, and drag the current branch name to meet that and keep HEAD attached, like this:

          I--J   <-- develop, main (HEAD)
         /
...--G--H
         \
          K--L   <-- topic

Note how no new commit happened. Git just moved the name main "forward" (to the end of the top line), against the direction Git normally moves (backwards from commit to parent). That's the fast-forward in action.

You can force Git to do a real merge for this particular case, but for our illustration purposes, we won't do that (it doesn't help your own case any). Instead, we'll now go on to do another merge where Git can't do a fast-forward. We will now run git merge topic.


2Latest here is not defined by dates but rather by the position in the graph: H is "closer to" J than G is, for instance. Technically, the merge base is defined by solving the Lowest Common Ancestor problem as extended for a Directed Acyclic Graph, and in some cases, there can be more than one merge base commit. We'll carefully ignore this case, hoping it never comes up, as it's rather complicated. Find some of my other answers to see what Git does when it does come up.

3Fast-forwarding is actually a property of label motions (branch names or remote-tracking names), rather than merges, but when you achieve this using git merge, Git calls it a fast-forward merge. When you get it with git fetch or git push, Git calls that a fast-forward, but usually says nothing; when it can't happen for fetch or push, you get a non-fast-forward error in some cases. I'll leave these out of this answer, though.


Real merges are harder

If we now run git merge topic, Git must once again find the merge base, i.e., the best shared commit. Remember that we're now in this situation:

          I--J   <-- develop, main (HEAD)
         /
...--G--H
         \
          K--L   <-- topic

Commits up through J are on main, our current branch. Commits up through H, plus K-L, are on topic. So which commit is the best shared commit? Well, work backwards from J: you start at J, then hit commit I, then H, then G, and so on. Now work backwards from L to K to H: commit H is shared, and it's the "furthest to the right" / latest possible shared commit, since G comes before H. So the merge base is once again commit H.

This time, though, commit H isn't the current commit: the current commit is J. So Git can't use the fast-forward cheat. Instead, it has to do a real merge. Note: this is where your original question came in. Merge is about combining changes. But commits themselves don't hold changes. They hold snapshots. How do we find what changed?

Git could compare commit H to commit I, then commit I to commit J, one at a time, to see what changed on main. That's not what it does though: it takes a somewhat different shortcut and compares H directly to J. It wouldn't really matter if it did go one commit at a time, though, because it is supposed to take all changes, even if one of those changes is "undo some change" (git revert).

The Git command that compares two commits is git diff (if you give it two commit hash IDs, anyway). So this is essentially equivalent to:4

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed

Having figured out what you changed since the common starting point, Git now needs to figure out what they changed, which is of course just another git diff:

git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

The job for git merge is now to combine these two sets of changes. If you changed line 17 of the README file, Git takes your update to line 17 of README. If they added a line after line 40 of main.py, Git takes their addition to main.py.

Git takes each of these changes—yours and theirs—and applies those changes to the snapshot in commit H, the merge base. That way, Git keeps your work and adds theirs—or, by the same argument, Git keeps their work and adds yours.

Note that if you did a revert somewhere after commit H, and they didn't, your revert is a change since the merge base, and they did not change anything since the merge base. So Git picks up the revert, too.

In some cases, you and they may have changed the same lines of the same file, but in a different way. You might have changes that conflict, in other words.5 For those cases, Git declares a merge conflict and leaves you with a mess that you must clean up yourself. But in a surprising number of cases, Git's merge just works by itself.

If Git is able to merge everything successfully on its own—or even if not, but as long as it thinks it did—Git will normally go on to make a new commit of its own. This new commit is special in exactly one way, but let's draw it first:

          I--J   <-- develop
         /    \
...--G--H      M   <-- main (HEAD)
         \    /
          K--L   <-- topic

Note how the name main is dragged forward one hop, as usual for any new commit, so that it points to the new commit Git just made. Commit M has a snapshot, just like any other commit. The snapshot is made from the files in Git's index / staging-area, just like any other commit.6

In fact, the only thing special about new merge commit M is that instead of just one parent commit J, it has two. To the usual first parent, Git adds a second parent, L. That's the commit we named in our git merge command. Note that none of the other branch names is affected either: the name main is updated, because it's the current branch. And, because the set of commits that are "on" a branch is found by working backwards from the last commit, now all commits are on main. We start at M, then we go back one hop to both commits J and L. From here, we move back one hop to both commits I and K. From there, we move back one hop to commit H: the moving-back-one-hop resolves this "multiple paths" problem at the point where the branches diverged earlier.


4The --find-renames part handles the case where you used git mv or equivalent. Merge turns on rename-finding automatically; git diff turns it on automatically by default in recent versions of Git, but in old ones, you need an explicit --find-renames.

5Git also declares a conflict if you changed a region that just touches (abuts) a region they changed. In some cases, there may be ordering constraints; in general, the people who work on merge software have found this gives the best overall results, producing conflicts when appropriate. You might occasionally get a conflict when one isn't really required, or not get one when there is a conflict, but in practice, this simple line-by-line rule works pretty well for most programming languages. (It tends to work less well for textual stuff like research papers, unless you get in the habit of putting each sentence or independent clause on its own line.)

6This means that if you have to resolve a conflict, you're actually doing this in Git's index / staging-area. You can use the working tree files to do it—that's what I usually do—or you can use the three input files, which Git leaves behind in the staging area to mark the conflict. We won't go into the details of any of this here, though, as this is just an overview.


Real merges leave traces

Now that we have this:

          I--J   <-- develop
         /    \
...--G--H      M   <-- main (HEAD)
         \    /
          K--L   <-- topic

we can git checkout topic or git switch topic and do more work on it:

          I--J   <-- develop
         /    \
...--G--H      M   <-- main
         \    /
          K--L   <-- topic (HEAD)

becomes:

          I--J   <-- develop
         /    \
...--G--H      M   <-- main
         \    /
          K--L---N--O   <-- topic (HEAD)

for instance. If we now git checkout main or git switch main, and run git merge topic again, what's the merge base commit?

Let's find out: from M, we go back to both J and L. From O, we go back to N, and then to L. Aha! Commit L is on both branches.

Commit K is on both branches, too, and so is commit H; but commits I-J aren't as we have to follow the "backward arrows" from commits and there is no link from L to M, only from M backwards to L. So from L we can get to K and then H, but we can't get to M that way, and there is no path to J or I. Commit K is clearly inferior to L, and H is inferior to K, and so on, so commit L is the best shared commit.

What this means is that our next git merge topic runs its two diffs as:

git diff --find-renames <hash-of-L> <hash-of-M>   # what we changed
git diff --find-renames <hash-of-L> <hash-of-O>   # what they changed

The "what we changed" part is basically rediscovering what we brought in from I-J, while the "what they changed" part figures out, quite literally, what they changed. Git combines these two sets of changes, applies the combined changes to the snapshot from L, and makes a new snapshot:

          I--J   <-- develop
         /    \
...--G--H      M------P   <-- main (HEAD)
         \    /      /
          K--L---N--O   <-- topic

Note that a fast-forward was not possible this time, as main identified commit M (the merge), not commit L (the merge base).

Should we do more development on topic later, and merge again, the future merge base will now be commit O. We don't have to repeat the old merge work except for the propagation of the difference from L to M (now preserved as the difference from O to P).

There are still more merge variants

We won't touch on git rebase—which, because it is repeated cherry-picking, is a form of merging (each cherry-pick is itself a merge)—but let's look briefly at git merge --squash. Let's start with this:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

so that it's clear that the merge base is commit H and that we are on commit J. We now run git merge --squash branch2. This locates L as before, does two git diffs as before, and combines work as before. But this time, instead of making a merge commit M, it makes a regular commit, which I will call S (for squash), that we draw like this:

          I--J--S   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

Note how S does not connect back to commit L at all. Git never remembers how we got S. S just has a snapshot that was made by the same process that would have made a merge commit M.

If we now add more commits to branch2:

          I--J--S   <-- branch1
         /
...--G--H
         \
          K--L-----N--O   <-- branch2 (HEAD)

and run git checkout branch1 or git switch branch1 and then git merge branch2 again, the merge base will be commit H again. When Git compares H vs S, it will see that we made all the same changes they made in L, plus whatever we made in I-J; when Git compares H vs O, it will see that they made all the changes they made in the whole sequence K-L-N-O; and Git will now have to combine our changes (that contain some of their changes from before) with all of their changes (that likewise contain some of their changes from before).

This does work, but the risk of merge conflicts goes up. If we keep using git merge --squash, the risk of merge conflicts goes way up, in most cases. As a general rule, the only thing to do after a squash like this is to drop branch2 entirely:

          I--J--S   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   ???

Commit S holds all the same changes as K-L, so we drop branch2, forgetting how to find commits K-L. We never look back for them, and eventually—after a long time—Git will really throw them out for real and they will be gone forever, provided nobody else made any names (branch or tag names) that let Git find them. It will seem as though history always went like this:

...--G--H--I--J--S--...   <-- somebranch

Summary

  • Fast-forward merges don't leave traces (and don't do any actual merging).
  • Real merges leave traces: a merge commit with two parents. The merge operation—the action of merging, or merge as a verb—uses the merge base to figure out what goes in the merge commit (merge as an adjective).
  • Squash merges leave no traces, and generally mean you should kill off the squashed branch.
  • A revert is just a normal everyday commit, so merging a revert merges the reversion. You can revert the revert, either before or after merging, to undo it.
torek
  • 448,244
  • 59
  • 642
  • 775
  • That's quite a blog post you penned down :). This will be useful for many people I hope. "Note that if you did a revert somewhere after commit H, and they didn't, your revert is a change since the merge base, and they did not change anything since the merge base. So Git picks up the revert, too." confirms my suspicion that the revert is picked up as a change to be merged. So will reverting the revert commit on the main branch end this once and for all and allow us to merge new changes to the release branch later on without any issues? – rubenvb Aug 24 '21 at 12:05
  • @rubenvb@rubenvb yes, reverting the revert will fix it. – TTT Aug 24 '21 at 19:12