In git, why do I have a conflict when cherry picking only latest commit of binary file?

Question

Setup: I have a main branch where a binary file has been modified multiple times. Every commit that changed the binary file only had that one file in the commit (no other files were changed in any of those commits). I am using SourceTree GUI for all of my git commands.

Problem: I want to cherry pick that binary file into the release branch. Since binary files are checked in as single blocks, I thought I could just cherry pick the final commit that changed the file into the release branch. But that causes a conflict. If, on the other hand, I start at the earliest commit that changed the file and cherry pick them over one-by-one, there is no conflict.

Question: Why can't I just cherry pick the last commit without getting conflicts? What exactly is conflicted? Is there a better way to do this cherry picking without having to find all the commits that touched that file?

Conflict is normal in your case, just force to use the one you cherry-pick. git cherry-pick -X theirs — Ôrel, Aug 08 '22 at 16:10
@Ôrel I get that it is normal... what I want to know is why is there a conflict? There's no change to the file on the release branch, so the usual reasons for a conflict (as I understand it) do not apply. — Inquisitor, Aug 08 '22 at 16:12
there is no check about the content but about if both commit have ancestor that change the same file. — Ôrel, Aug 08 '22 at 17:03

score 2 · Accepted Answer · answered Aug 08 '22 at 17:51

A cherry-pick is a merge, of sorts: it's a merge in which the "merge base" is forced to be the parent of the commit being cherry-picked ("copied").

That is, given a branch structure like this:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2

when we run git switch br1 && git merge br2 we're asking Git to combine work done since a common starting point. Here, the "work done" is "whatever changed from commit H to commit J" ("our" work on br1), vs "whatever changed from commit H to commit L" ("their" work on br2). So Git diffs each file in commit H against each file in commit J: whatever changed, that's "our" work. Git then diffs each file in commit H against each file in commit L: whatever changed, that's "their" work. Git then combines the two sets of changes. Where the changes overlap, but don't exactly match, that's a conflict (note: this isn't a full list of all possible conflicts, just a high-speed review to cover the major cases).

Cherry picking is similar but different. We're given a structure like this:

       o--o--P--C--o--o   <-- br2
      /
...--*
      \
       A--B   <-- br1 (HEAD)

where we're sitting at commit B. We ask to "copy" commit C. This means find out what changed in C, which means Git needs to run the same kind of git diff of P, C's parent, and C, that git merge would do for a regular merge. That gets a set of changes from commit P that C makes.

To apply those changes to commit B, though, Git needs to know where those changes fit in. What if we moved a block of code down by inserting a bunch of new code in A? What if we moved a block of code up by deleting some code in one of the o's before P, or in P itself? To find out which parts of commit C match up, Git does a git diff of the snapshot in commit P against the snapshot in commit B. Now Git knows about the blocks of code inserted or deleted.

To apply the change from P-to-C, then, Git can now use the information it found from P-vs-B. But—hang on a minute... that's exactly how git merge works in the first place. All Git has to do is combine the changes from P-to-C, "their" work, with the changes from P-to-B, "our" work. So Git literally uses the same git merge code.

For text files this works great: any changes "we" "made", including the "backing out" of stuff that happens because P is later than *, get backed out if appropriate. Any changes we actually made, like changes we made in A since *, get added in. Git does not actually look at each individual change, one commit at a time: it just uses the wholesale P-vs-whatever diff to get everything at once.

For the binary file, though, all Git knows is "hey, this is different". The binary file in P is different from the binary file in C, and that one is different from the one in B. That's your conflict.

If you cherry-pick multiple commits, though, this might change. There are no guarantees here, but suppose that the binary file changed between * and the first o, but not between * and A or between A and B. Then if we cherry-pick the first o we pick up the change of the binary file: there's no conflict because we got just one change, from their side, * vs first o. Then there's another change of the same binary file between the two os or between the second o and commit P. There's still no change on "our" side because we've now picked up the *-vs-first-o change, so we pick up their change. Then when we get to P-vs-C, there's a change of the binary file again, but this time we're on a commit that has the P version of the binary file, so there's no conflict switching to the C version.

To reason about these things, it's necessary to look at every commit in the chain when cherry-picking multiple commits, and to not look at every commit in the chain when cherry-picking just one commit. This is why rebase is much more complicated than merge: rebase is repeated merging, and that's a different proposition than one-time merging.

In git, why do I have a conflict when cherry picking only latest commit of binary file?

1 Answers1