3

I am importing an old svn repo into git. At one point a folder was renamed in all branches. This was done in svn by creating a duplicate with history, followed by a delete of the original on a second commit. So I have a repo that looks like this:

A -> B -> C -> D* -> E* -> F -> G -> H
      \-> 1 -> 2* -> 3* - > 4 -/

Where D/E and 2/3 are the commits I want to squash. The reason for squashing is that while svn knows of "duplicate with history", git doesn't see this as a rename since the original files weren't removed until the next commit, and I lose history on blame at this point.

I've experimented with some rebase scripts which work, but they also flatten all my branches. The above is a seriously simplified version of what I have to do, which is why I really need scripts as I can't do it manually. There are over 1,000 branches throughout the history of the SVN repo and probably a dozen parallel branches where this change was done (all at the same time).

The git repo has not been published yet, so maintaining hashes is irrelevant. I assume I'll need to use some filter-branch script, but I'm still trying to figure out how to manage that which is what I was hoping I might get help with here. I can provide the sha1 of every commit that needs squashed and its parent.

  • checkout 4, git rebase -i 1, change 2 and 3 commits to squash, they will squash up into 1 – g19fanatic Feb 22 '17 at 19:33
  • This leaves me with an orphaned 4', which then I have to rebase the rest of my tree onto. I have 7 years of history and nearly 100K commits and all the branches and merges that comes with that after this point, which makes manually rebasing and fixing this history very complicated. – WorksOnMyBox Feb 22 '17 at 20:06

1 Answers1

2

You want to use a git filter-branch using --parent-filter to replace any appearance of D's SHA with C's SHA. You can also look into .git/info/grafts or git replace, which might be simpler than writing a --parent-filter and can be made permanent with a filter-branch.

Update: As @torek says, you should definitely use git replace. To use a real-life example, here's a rename from readme.md to README.md was executed with an intermediate rename to README1.md: https://github.com/dahlbyk/posh-git/compare/dahlbyk:2b9342c...dahlbyk:57394c5. Let's call 2b9342c your C and 57394c5 your E:

$ git tag E 57394c5
$ git tag C 2b9342c
$ git tag G 450d8f1
$ git log --oneline --graph --decorate C~..G
*   450d8f1 (tag: G) Merge pull request #320 ...
|\  
| * 941935c Fix a few kbd / missing markdown issues/
| * f13dcf9 Upcase readme and have more prompt examples.
| * 57394c5 (tag: E) Now rename to README.md.
| * eb79ef2 Prepare to upcase README.md filename.
* |   536c57f Merge pull request #319 ...
|\ \  
| |/  
|/|   
| * 7fafb7b Speed up Get-GitStatus
|/  
* 2b9342c (tag: C) Merge pull request #313 ...

To pretend that the intermediate move never happened, I can replace E's parent (E~) with its grandparent (E~2 = C):

$ git log --stat --oneline C..E
57394c5 Now rename to README.md.
 README1.md => README.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
eb79ef2 Prepare to upcase README.md filename.
 readme.md => README1.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
$ git replace E~ C
$ git log --stat --oneline C..E
57394c5 Now rename to README.md.
 readme.md => README.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
eb79ef2 Merge pull request ...

Finally, a filter-branch will make the changes permanent:

$ git filter-branch -- ^C G E  # For demo, only rewrite G & E afer C
$ git log --graph --oneline --decorate C~..G
*   fcfd345 (tag: G) Merge pull request #320 ...
|\  
| * fa76267 Fix a few kbd / missing markdown issues/
| * 4900687 Upcase readme and have more prompt examples.
| * b25aa5a (tag: E) Now rename to README.md.
* |   536c57f Merge pull request #319 ...
|\ \  
| |/  
|/|   
| * 7fafb7b Speed up Get-GitStatus
|/  
* 2b9342c (tag: C) Merge pull request #313 ...

For your purposes, you'll do something like:

$ git replace E~ E~2
$ git replace 3~ 3~2
$ git filter-branch -- ^A --all

Update 2:

The commit message I get is off of E, which I don't care about. I'd rather have D's commit message (or a script provided message).

To keep D's commit metadata, I would suggest starting over and using a --commit-filter to specify E's tree (git cat-file -p E) for D (and that E should be skipped), e.g.

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = "SHA of D" ];
  then
    git commit-tree "TREE of E" -p "SHA of C";
  elif [ "$GIT_COMMIT" = "SHA of E" ];
  then
    skip_commit "$@";
  else
    git commit-tree "$@";
  fi;
  ' -- ^A E G
dahlbyk
  • 75,175
  • 8
  • 100
  • 122
  • If you get stuck, leave an update or comment with your progress and we can try to help. – dahlbyk Feb 22 '17 at 20:53
  • For this case, I would recommend running `git replace` twice: first, to make an `E'` that points back to `C`, with Git using `E'` instead of `E`; and then again to make `3'` that points back to `1`, with Git using `3'` instead of `3`. Then, as you said, optionally run `git filter-branch` to cement the replacements and discard the original `D-E` and `2-3` entirely. – torek Feb 23 '17 at 01:26
  • Ah yes, didn't notice the same operation was needed for `2-3`. Updated. – dahlbyk Feb 23 '17 at 18:42
  • This is great, my end tree is exactly what I want! Thank you! I didn't know about Git Replace. 2 Questions with this: 1) What is the ^A in your filter-branch? I am familiar with the rest of the notations, but not sure what that means. I assume it marks commit A as a start point to avoid filtering anything before? 2) The commit message I get is off of E, which I don't care about. I'd rather have D's commit message (or a script provided message). This isn't a showstopper though, what you've given me already is enough to move forward! Thank you! – WorksOnMyBox Feb 23 '17 at 21:16
  • `^A` means "commits not reachable from `A`" (described in [`git rev-list` help](https://git-scm.com/docs/git-rev-list)), which is just an optimization for your `filter-branch` since you don't need to rewrite any commits before `A`. The easiest way to change `D`'s commit message would probably be to use a `--msg-filter` that essentially does a search/replace from new message to the old. Another option would be to start over, skipping `git-replace` in favor of a `--commit-filter` that specifies `E^{tree}` for `D` and skips commit `E`. – dahlbyk Feb 23 '17 at 22:26