25

I have been trying to use

git log --no-merges --cherry-pick --right-only master...my-branch

to generate a list of commits that are in the my-branch, but not in master (as per the git-log documentation). However, there are still many equivalent commits that are coming up in the list. If I show them and their patches, there is no difference apart from the commit id.

git show 16cbd0e47406a4f7acbd6dc13f02d74d0b6a7621 >patcha
git show c53c7c32dcd84bfa7096a50b27738458e84536d5 >patchb

diff patcha patchb
1c1
< commit 16cbd0e47406a4f7acbd6dc13f02d74d0b6a7621
---
> commit c53c7c32dcd84bfa7096a50b27738458e84536d5

And even git patch-id shows them as being equivalent:

git show c53c7c32dcd84bfa7096a50b27738458e84536d5 | git patch-id
2b5504fb9a8622b4326195d88c7a20f29701e62b c53c7c32dcd84bfa7096a50b27738458e84536d5
git show 16cbd0e47406a4f7acbd6dc13f02d74d0b6a7621 | git patch-id
2b5504fb9a8622b4326195d88c7a20f29701e62b 16cbd0e47406a4f7acbd6dc13f02d74d0b6a7621

How does git log --cherry-pick not pick these up as duplicates?

Wivlaro
  • 1,455
  • 1
  • 12
  • 18
  • That should be fixed with Git 2.31 (Q1 2021): https://stackoverflow.com/a/65946225/6309 – VonC Jan 28 '21 at 22:41

2 Answers2

6

Have you merged master into your branch since doing the cherry picks? --cherry-pick works first by matching the commit id, and then if that fails, looking for the patch id. If you've merged master into your branch, then you'll now have the actual commit on your branch and the cherry-picked version. So it'll find the commit id, and then proceed to report the cherry-picked version.

I've often wondered if git should always check both, but that's probably a considerable performance hit.

John Szakmeister
  • 44,691
  • 9
  • 89
  • 79
  • The commit ids do appear to be different on each branch. But they may have both been merged into both branches at different points (if that makes sense). – Wivlaro Apr 04 '13 at 10:00
  • 4
    Cherry-picking will always introduce a new commit id, because the parent or time has changed. But if `master` was merged to `my-branch`, both versions now exists on the branch, so the cherry-picked version will be reported since the exact match by commit id takes precedence. Try using `git branch --contains 16cbd0e` and `git branch --contains c53c7c3`. I bet at least one of them shows both `master` and `my-branch`. One thing to consider: if you're going to merge a branch, you probably shouldn't cherry pick from it. Not only because of this, but it makes the history confusing too. – John Szakmeister Apr 04 '13 at 10:19
  • 2
    This is still bothering me. We merge to keep the histories consistent and we DO end up cherry-picking because it's hard to enforce that kind of discipline of knowing where you should be doing the work at the point that it's getting done. Cherry-picking has ended up being inevitable. Bugs get fixed first, the location is often an afterthought. I'm still looking for a way to remove these duplicate commits from a view of the log if you have any other ideas besides writing my own tool to look at the patch ids. – Wivlaro Apr 22 '13 at 09:40
  • 1
    @Wivlaro Unfortunately, I don't see a way to do anything else. I had tracked down this behavior several months ago, and went diving into the Git source code. I didn't see an option to make it filter on both patch and commit id. :-( – John Szakmeister Apr 22 '13 at 10:14
  • @jszakmeister I am also troubled by this problem. I have a workflow which master tracks upstream, develop periodically merge master, feature branch is based on develop. After feature is completed, master pulls newest upstream and develop merges it, rebases feature onto the new develop, creates a new branch 'ready-for-review' points to the feature and rebases onto master. After it is included upstream, develop merges both updated master and feature. So, there will be two duplicate commits, one from master, one from feature, but develop also has commits not in master, which need to be shown. – weynhamz Dec 30 '13 at 10:55
  • @jszakmeister Form your experience of the relavant code, is this easy/possible to be implemented? – weynhamz Dec 30 '13 at 10:56
  • 1
    @TechliveZheng It's definitely possible, and I even think it's simple enough. I think the hurdle is going to be the performance barrier. It means that you would need to search on both the patch id and the commit id, and that may put the performance into an area that the Git project would possibly find unacceptable. It's not as cut and dry as a 50% performance degradation since you'd only be checking the patch id of commits not present on either side. It's probably worth bringing up on the Git mailing list. – John Szakmeister Dec 30 '13 at 11:57
  • @jszakmeister Yeah, there should be an option for the user, I've noted this, very likely will put some effort onto it someday. – weynhamz Dec 30 '13 at 13:27
  • Yeah, this is annoying. The command `git cherry` has the same problem. – pavon Aug 20 '14 at 20:00
  • I thought the purpose for `git log --cherry-pick downstream..upstream` was to see if your upstream branch was ahead of the downstream branch by any commits that HAVE NOT been cherry-picked... If I'm merging master into my branch, that defeats the purpose of the cherry-pick IMO (I'm not sure how well I understand the comments above, but I'm trying to solve the same problem). – richardpringle Jan 23 '18 at 18:38
  • @richardpringle You're right, it is. But if you a branch (X) that cherry-picked (C') a commit (C) from another branch (Y) and then merge Y into X, you'll find that cherry-pick doesn't work as expected. IMHO, it's a bug in the implementation. Git found the actual commit id so it doesn't ferret out versions of that same commit that have been cherry-picked. In other words, since it found C present on the branch, it doesn't do the necessary checks to see if a C' exists. Git only falls back to the slow case (checking for C') when the fast one fails. I hope that makes sense. :-) – John Szakmeister Jan 23 '18 at 18:56
  • @jszakmeister, but I don't understand why I would ever merge Y into X. If I'm going to do that, it defeats the purpose of cherry-picking. You would do one or the other, not both. My problem is that (without the merge) C' is still showing up. – richardpringle Jan 23 '18 at 22:23
  • @richardpringle Again, I completely agree. However, I’ve seen people do it and it does result in this problem. I chalk it up as a newbie mistake (they don’t really get what the workflow should be). In your case it may simply be that the patch ids are different. Take a look at the output of “git show C C’ | git patch-id”. If the first number of each line differs then cherry-pick will fail to see the commits are associated with each other. This generally happens when there’s enough difference in the surrounding code that you have to adjust things slightly and it results a different diff. – John Szakmeister Jan 24 '18 at 12:01
  • @jszakmeister that `git show C C' | git patch-id` helped me understand what was happening! Thanks a lot! I actually have no clue what I was doing before but I managed to get the command to work as expected on `git 2.16.1`. – richardpringle Jan 29 '18 at 20:42
  • @richardpringle Awesome! Glad it helped. It definitely takes a bit of work to wrap your head around the details of Git, but once you do it's a pretty amazing tool. :-) – John Szakmeister Jan 29 '18 at 20:44
2

I've often wondered if git should always check both, but that's probably a considerable performance hit.

That behavior is now (Git 2.11, Q4 2016) quicker than before.

See commit 7c81040 (12 Sep 2016), and commit 5a29cbc (09 Sep 2016) by Jeff King (peff).
Helped-by: Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit f0a84de, 21 Sep 2016)

patch-ids: refuse to compute patch-id for merge commit

"git log --cherry-pick" used to include merge commits as candidates to be matched up with other commits, resulting a lot of wasted time. The patch-id generation logic has been updated to ignore merges to avoid the wastage.

[...] we may spend a lot of extra time computing these merge diffs.
In the case that inspired this patch, a "git format-patch --cherry-pick" dropped from over 3 minutes to less than 3 seconds.


And with Git 2.31 (Q1 2021), it is fixed: when more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B"(man) now does exclude them all when a commit with the same patch ID appears on the other side.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 2
    I'm on Git 2.35 and it doesn't seem fixed for me unless I'm misunderstanding. To replicate: I have 2 branches, `dev` and `master`. `dev` is ahead of `master` by 1 commit, so I cherry-pick that commit to `master`, and then merge `master` into `dev`. When I then run `git log --cherry-pick master...dev` the original commit still shows up, even though it has the same patch-id as the cherry-pick commit. – André T. Jun 20 '22 at 12:31
  • @AndréT. Interresting. Can you make a separate question to illustrate that scenario and bug? Also 2.37 is right around the corner. – VonC Jun 20 '22 at 12:58