Merge conflicts happen during merges, not after them.
The conflict part is really very simple: A conflict occurs in file F when:
- "our" changes—the difference from the merge base commit to our
HEAD
commit—has a change to file F, and
- "their" changes—the difference from the same merge base commit to their tip commit—also have a change to file F as well, and
- our changes and their changes overlap (edit: or abut—see comments), but are not identical.
To understand this, you need to:
- understand the output of
git diff
; and
- understand what a merge base is.
The output of git diff
is pretty straightforward, really, but it requires remembering that each commit holds a snapshot of all of your files. This means we must give git diff
two snapshots: an old one, and a new one. That's two "pictures" of what the file were like at two points in time. Git then plays a game of Spot the Difference: it tells you that to go from the left-side snapshot to the right-side snapshot, you must make some set of changes to some set of files. These changes may involve renaming some files; they might involve adding new files; they might involve deleting files; and they might involve deleting some particular lines from some files, and adding some lines to some files, in some particular places.
The output of git diff
is not necessarily anything any person did. It's just a set of changes that, if applied to the left-side snapshot, get you the right-side snapshot. The "left side" here is the left argument to git diff
and the "right side" here is the right argument, when you use:
git diff <hash1> <hash2>
where the two hashes are the hash IDs of commits. (This is what git merge
does, in effect, although it does all of this internally.) The diff engine is designed to produce the smallest set of changes that give the right effect. This, as it turns out, is usually what someone actually did do ... but not always; it's therefore usually right, but not always.
The last, but probably trickiest, part of understanding git merge
is the concept of a merge base. Technically, the merge base is the (single) commit that emerges from an algorithm that finds the Lowest Common Ancestor (LCA) of nodes chosen from a a Directed Acyclic Graph (DAG). Not all DAG node pairs (or sets) have an LCA: some have none, and some have more than one. It's pretty common for your Git's commit graph to have a single LCA here, though, and git merge
has some methods for dealing with multiple LCAs. (When there is no LCA, the modern git merge
refuses to run by default, telling you that the two branches have unrelated histories. Old Git ran the merge anyway, and you can make modern Git do the merge anyway; in this case, Git uses a synthetic commit with no files as the merge base.)
The important part here is having a conceptual "feel" for the merge base. For some graphs, this is easy. Consider for instance the case of a Git commit graph where your two branches simply fork from a common ancestor commit whose hash ID is H
:
I--J <-- branch1 (HEAD)
/
...--G--H
\
K--L <-- branch2
Here, when merging branch1
and branch2
—which means commits J
and L
—the common starting point is clearly commit H
. So git merge
will run two git diff
commands, and the merge base in each will be H
:
git diff --find-renames <hash-of-H> <hash-of-J> # what we changed on branch1
git diff --find-renames <hash-of-H> <hash-of-L> # what they changed on branch2
Git will now combine the set of changes produced by these two git diff
commands. Where they overlap, but don't make the same change, is where you will get merge conflicts.
Git will apply the combined changes to the snapshot in H
. Applying your change to this snapshot results in commit J
; applying their changes results in commit L
; applying the combined changes results in, well, the combination.
If there are no conflicts, Git will be able to combine the changes on its own. Having applied the combined changes, Git will commit the result on its own, as a new merge commit M
:
I--J
/ \
...--G--H M <-- branch1 (HEAD)
\ /
K--L <-- branch2
and this will be your merge result.
If the combining fails, Git stops in the middle of the merge. Your job is now to finish the merge (combine the changes yourself), then tell Git you've done it and to the merge commit. If this is too big a mess, you can tell Git: abort the merge entirely and it will back out all of its attempts to combine things and leave you back on commit J
, as if you'd never even run git merge
at all.
The last tricky bit is this: when you do finish a merge—automatically through Git, or manually—the resulting merge commit records two parents. That is, if you look at merge M
above, you'll see that it connects back to both commits J
and L
. In many merges we'd draw this a little differently:
o--o <-- small-feature
/ \
...--o--B--o--D--o---o--o <-- mainline
\
o--o--o--o--o--o <-- big-feature
Here the small feature got merged into the mainline, and the big feature is still in progress. The merge base of the small feature was commit D
. The merge base of the big feature will be commit B
. (The rest of the commits are not very interesting.) In some cases, though, we get a more-tangled graph:
o--o---o <-- offshoot-feature
/ / \
o--o---o---o--o <-- medium-feature
/ \ /
...--o--o--o--o--o---o----o <-- mainline
This graph isn't all that complicated, but it's really hard, now, to see where the merge bases are, because of all the cross merging from the various features into mainline and each other.
Git will find the merge bases. You can find merge bases yourself using git merge-base --all
. You can draw the graph, or have Git draw it with git log --graph
, and try to find merge bases by eyeball. Having found merge bases, however you did it, you can run the two git diff
commands that git merge
would run. This will tell you where your conflicts will be. But usually, there's no point: just run git merge
and find the conflicts.