0

I've inherited some code that is using git log --no-merges --right-only --cherry-pick --since='2 months ago' some_tag..origin/master -- path1, path2, ... as an initial step in determining commits that are missing from some_tag. The main problem is it's slow and there is no status.

I can use git log git log --since='2 months ago' origin/master -- path1, path2, ... to get all the commits added for those paths in the specified time, which is fast. Then I'd like to spawn multiple threads to then check the commits individually, but I'm not sure what the equivalent would be for a single commit. Perhaps generating a patch file and using git apply --check and git apply --reverse --check, but I'm not sure that would be equivalent.

Or perhaps there is a more direct way to do it?

aviso
  • 2,371
  • 1
  • 14
  • 15
  • Maybe `git commit-graph write` could make a difference? Not sure. – Guildenstern May 03 '23 at 21:14
  • No, tried it and it didn’t make a difference :) – Guildenstern May 03 '23 at 21:55
  • How slow are we talking about ? How many commits are listed in `git log tag..branch` and `git log branch..tag` ? – LeGEC May 04 '23 at 05:53
  • note: git computes some form of hash for each diff, to the size of the individual diffs also counts. for example: if you have some versioned binary files that get modified on both sides, then this would increase the time it takes to compute said hashes. – LeGEC May 04 '23 at 05:55
  • 14 - 18 minutes. 77 commits both ways when limiting to the specific paths and time, 17k commits if just limiting to time. But I don't think it's relevant. The question is how to parallelize it. – aviso May 04 '23 at 10:59

2 Answers2

2

Per this conversation, the file paths are what make this a long running operation and the time can be significantly reduced by taking the output of git log --no-merges --right-only --cherry-pick --since='2 months ago' some_tag..origin/master and filtering it with the output of git log --since='2 months ago' origin/master -- path1, path2, ... In my case this produces the same result in seconds. It is not clear if this will work in all cases or if this something that can be optimized internally.

aviso
  • 2,371
  • 1
  • 14
  • 15
1

You can call git patch-id (in parallel) on the patch for each of these commits, and then compare which commits have the same patch-id on both sides.

It is not entirely clear if you are looking to port complete commits (in which case you should compute patch-id on the complete diff) or only on that specific subset of files (in which case you could restrict the input diff to that set of paths only)

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • This does get a lot of them, but not all. Currently sample has 71 commits. `git log --no-merges --right-only --cherry-pick` is returning 48 missing commits. This method is returning 63. So it's not catching 15, though no false positives. I could run these through git apply (look for false) and git apply --reverse (look for true) to identify another 4, but that still leaves 11 false negatives. – aviso May 05 '23 at 19:46