1

I know how to solve this using traditional tools at hand, but I want to understand the possibilities and see whether things can be done more effectively.

In this example - I want to add B as a second parent to C.

main   ∙∙∙A---C---D         ∙∙∙A---C---D  
                       ->         /   
feat   ∙∙∙--B               ∙∙∙--B   
  • I am looking for a "permanent" history rewrite solution,
  • without writing "virtual" replace refs or the likes.
  • C's and D's author (can be somebody else than me) & date must not change.
  • The solution must not rewrite unnecessary commits, so in this case it can only rewrite C and D.

In short, to permanently write the result of something like this into repo, so that it doesn't use replace refs, just pure simple commits:

git replace --graft $commit $parent1 $parent2

I remember studying this topic on Plumbing and Porcelain in git docs and I remember I spent too much time on it, but I have no recollection of what I learned. What can I try next?

halfer
  • 19,824
  • 17
  • 99
  • 186
Qwerty
  • 29,062
  • 22
  • 108
  • 136

3 Answers3

2

In your example, we could use git commit-tree and git update-ref.

# Create the substitute of C
s=$(git log -1 --pretty=%B C | GIT_AUTHOR_NAME=$(git log -1 --pretty=%an C) \
    GIT_AUTHOR_DATE=$(git log -1 --pretty=%ad --date=iso C) \
    GIT_COMMITTER_NAME=$(git log -1 --pretty=%cn C) \
    GIT_COMMITTER_DATE=$(git log -1 --pretty=%cd --date=iso C) \
    git commit-tree -p A -p B -F - C^{tree})

# Update main
git update-ref refs/heads/main $s

In the same way to create the substitute of D

git update-ref refs/heads/main \
    $(git log -1 --pretty=%B D | GIT_AUTHOR_NAME=$(git log -1 --pretty=%an D) \
        GIT_AUTHOR_DATE=$(git log -1 --pretty=%ad --date=iso D) \
        GIT_COMMITTER_NAME=$(git log -1 --pretty=%cn D) \
        GIT_COMMITTER_DATE=$(git log -1 --pretty=%cd --date=iso D) \
        git commit-tree -p main -F - main^{tree})

When we create the substitute, we reuse the commit message, the author name and date, the committer name and date, and the tree object referenced by the commit. Only the parents are changed.

We get the commit message and pass it to stdin, and then it is captured by -F -.

The GIT_ environmental variables specify the author and committer.

-p A and -p B specify the parents. By their orders, A is the first parent and B is the second parent.

-F - reads the commit message from stdin.

C^{tree} means to reuse the tree object referenced by C, so that C and its substitute have the same directories and files.

Note that in this way, the substitute of C is a man-made commit. The contents of its files are not generated by the natural merge of A and B. We just reuse the contents of C's files. If you want a natural merge, use git merge to create the merge commit first and then rewrite the merge commit with the meta data of C.

ElpieKay
  • 27,194
  • 6
  • 32
  • 53
  • Interesting, what is the practical difference between a "man-made commit" and "natural merge" if I replace the content of the merge with `C`'s data anyway, so that it results in same content as in case with the man-made commit? – Qwerty Feb 22 '23 at 01:56
  • 1
    @Qwerty As English is not my native language, a "man-made commit" may be not proper. By a natural merge, the contents of C's substitute would be the combination of A's contents and B's, like how `git checkout A && git merge B` do. In my answer, C's substitute is a merge commit. Its history includes B, but its contents don't include B's contents (those unique to B), like how `git checkout A && git merge -s ours B` do. As to `-s ours`, see https://www.git-scm.com/docs/git-merge#Documentation/git-merge.txt-ours-1. – ElpieKay Feb 22 '23 at 02:13
  • Understood and thanks! Yes, I am aware that `C` won't physically include the changes introduced in `B`. That was not the goal. But now I see that I have chosen wrong names for the branches. I should use something like `main | test`. `feat` is something you usually want to include! :'D Here, I only want to link the branches together, but keep 100% `C`'s content. So, if I had done a merge commit `B->C` and then did a hard reset of `C` on `C-with-merge`, I would not gain any practical difference compared to artificially creating the commit like in your answer. There would be no difference, right? – Qwerty Feb 22 '23 at 02:17
  • @Qwerty Sorry, I don't understand clearly what `if I had done a merge commit B->C and then did a hard reset of C on C-with-merge` mean. Anyway, you could take a try. Instead of a "man-made" commit, I think a "forged" commit is more proper. – ElpieKay Feb 22 '23 at 02:36
2

Also worth knowing :

if you have replaced commits, running any subcommand of git filter-branch or git filter-repo will "set in stone" these replacements:
all replaced commits and their descendants will be rewritten.

So running git filter-repo --partial (with no specific action) or git filter-branch --index-filter true will create actual commits matching the replacement rules you provided.

To avoid having such commands act on the complete history of your branch, you may want to narrow down the range of commits to rewrite, for example:

# if commit C is part of several branches, you need to name them all here
# (you can also move them manually afterwards, but it could be more cumbersome)
git filter-repo --partial --refs=A..HEAD
LeGEC
  • 46,477
  • 5
  • 57
  • 104
1

For this task we will need:

  • git cat-file
  • sed
  • git hash-object
  • git update-ref

Let's create a sample repository. This script sets things up so that we have repeatable commit ids and we don't run into any issues caused by weird local git configurations:

#!/bin/bash

set -e

HOME=$PWD
GIT_AUTHOR_NAME=Alice
GIT_AUTHOR_EMAIL=alice@example.com
GIT_AUTHOR_DATE="2023-01-01 00:00:00"
GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME
GIT_COMMITTER_EMAIL=$GIT_AUTHOR_EMAIL
GIT_COMMITTER_DATE=$GIT_AUTHOR_DATE
export HOME GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL,DATE}

workdir="$(mktemp -d "$PWD/gitXXXXXX")"
trap 'cd /; rm -rf $workdir' EXIT

cd "$workdir"

git config --global init.defaultBranch main
git init

for x in A C D; do
    echo "file for commit $x" > file-$x
    git add file-$x
    git commit -m "$x"
done

git checkout --orphan feat
git reset
echo "file for commit B" > file-B
git add file-B
git commit -m 'B'
git checkout -f main

PS1="git$ " bash --norc

That gets us:

git$ git log --oneline
6352bde (HEAD -> main) D
25635b4 C
79a5602 A
git$ git log --oneline feat
db65aa0 (feat) B

We can use git cat-file -p to dump the structure of a commit. We want to add a new parent to commit C, which looks like:

git$ git cat-file -p 25635b4
tree dfa1779e5574c1b6f1c9c9071aa1a820b1e03680
parent 79a56022dc4511577b0281bb034b56e0352d2e36
author Alice <alice@example.com> 1672549200 -0500
committer Alice <alice@example.com> 1672549200 -0500

C

To make B a parent of this commit, we need to add a second parent line. We need the full commit id for commit B:

git$ git rev-parse feat
db65aa0d30cf551fdd25ad93d0c8e2f8da057572

We can add that as a parent of C using sed, like this:

git$ git cat-file -p 25635b4 | sed '/parent/ a\parent db65aa0d30cf551fdd25ad93d0c8e2f8da057572'
tree dfa1779e5574c1b6f1c9c9071aa1a820b1e03680
parent 79a56022dc4511577b0281bb034b56e0352d2e36
parent db65aa0d30cf551fdd25ad93d0c8e2f8da057572
author Alice <alice@example.com> 1672549200 -0500
committer Alice <alice@example.com> 1672549200 -0500

C

That looks right. Now we need to write that into the object database:

git$ git cat-file -p 25635b4 | sed '/parent/ a\parent db65aa0d30cf551fdd25ad93d0c8e2f8da057572' | git hash-object -t commit --stdin -w
a6db46299e550128fa8534dcc001f961ac4265c5

So now we have commit C' with commit id a6db46299e550128fa8534dcc001f961ac4265c5. We need to edit D to get D' with parent C', which is just a simple search/replace operation:

git$ git cat-file -p 6352bde  | sed 's/25635b41c4e003279c17c0cc50bf1e565b36ecfb/a6db46299e550128fa8534dcc001f961ac4265c5/' | git hash-object -t commit --stdin -w
96b1aea0c6c75f53b0ae45658b8ffdb3921560c0

Lastly, we need to update the main branch to point to D' as the new HEAD:

git$ git update-ref refs/heads/main 96b1aea0c6c75f53b0ae45658b8ffdb3921560c0

Now let's see what we have:

git$ git log --graph --pretty='%h (%s)%n' --abbrev-commit --date=relative --branches --all --decorate
* 96b1aea (D)
|
*   a6db462 (C)
|\
| |
| * db65aa0 (B)
|
* 79a5602 (A)

I think that's what you were after.


I wrote out all the steps here in detail which makes it look enormous compare to the solution from @ElpieKay, but when we distill it down to the crucial commands we get:

#!/bin/bash

git update-ref refs/heads/main "$(
    git cat-file -p "$C" |
    sed "/parent/ a\parent $B" |
    git hash-object -t commit --stdin -w
)"

git update-ref refs/heads/main "$(
    git cat-file -p "$D" |
    sed "/parent/ s/parent.*/parent $(git rev-parse main)/" |
    git hash-object -t commit --stdin -w
)"

Fill in $A through $D with the appropriate commit ids. If you're working with the sample repository created by the script earlier in the post, you can run (assuming you save the script to a file named reparent-w-hash-object.sh):

eval $(git log --oneline --branches --pretty="%s=%H") \
  sh reparent-w-hash-object.sh
larsks
  • 277,717
  • 41
  • 399
  • 399
  • 1
    This is incredible! So we can update the record by hand, calculate its hash and put it in the database?! – Qwerty Feb 22 '23 at 02:11