Dealing with a file being renamed while having new content under its old name in Git

Question

This question begins where this one left off: Just like there, the situation is that, in one commit, a file A was renamed to B and new content was written to a file named A.

The linked question asks if there is a way to represent this rename in the commit and the accepted answer correctly points out that commits know nothing of renames and rename information is computed on the fly when needed, e.g. during diffs or rebases.

Given a commit as described above, git diff with default settings shows the change as a modification of A and creation of a new file B, no matter how much larger that makes the diff.

My questions are:

Is there a way to tune git diff's settings so it understands the situation more accurately, as would be evidenced by a much smaller diff?
Failing that, what is the most idiomatic way of splitting up the commit into two separate commits, one for the A → B rename^* and one for the creation of the new A (to at least assist a human reviewer with understanding what happened if they go through commits one by one)?

^{* Note that I'll use "a commit for the A → B rename" as a shorthand for "a commit whose tree no longer contains A but contains B with the old contents of A".}

Some hints / things I've tried:

For the first question, some improvement can be gained using git diff's -C/--find-copies switch. This will let it represent the change as A having been copied to B and then A having been modified such that its contents are replaced with the new ones. This halves the size of the "unnecessary" parts of the diff, which are then just the removal of A's "old contents". Still, it's not ideal.

For the second question, I guess the "brute force" way of doing it is:

Reset HEAD back to before the commit in question while keeping the modifications in the working tree, or if it's not the latest commit, do an interactive rebase with e at that commit, then reset HEAD during the rebase.
git add B to add B to the staging area.
git rm --cached A to add the removal of A to the staging area.
git commit ... those two changes, leaving A as an unstaged file in the working tree.
git add A and git commit ... to commit the creation of the new A.

This is quite cumbersome and I don't like that one has to leave A as an unstaged file in the working tree between the two commits. But trying to do something "clever" like rebasing off a commit with only the A → B rename or stashing the changed A when redoing the commit manually makes matters worse, with Git's rename detection kicking in and in both cases just getting rid of A altogether without warning. The most success I've had here is with rebasing using the old recursive merge strategy and no-renames option, which at least raises a conflict instead of silently getting rid of A.

As you noted, the way to get what you want is to create two commits. One is a `git mv A B`. The second is a `git add A`. Tuning `git diff` is not a good strategy because it applies only to you and not to anyone else who views your repo. To do the split, you can reset back to the starting point and then commit the two steps. To recover the modified A, you can do `git checkout COMMIT-WITH-NEW-A -- A` — Raymond Chen, Apr 22 '23 at 23:14

jthill · Accepted Answer · 2023-04-23T00:19:34.197

You can show Git's rename/copy detection what you're doing by splitting renames and copies into their own commits. Instead of renaming and punning all in one commit, do the rename, commit, create the new wine where the old bottle was, commit, and now Git sees the sequence (and that you care enough to record it; the fact that basically nobody does, despite it taking like five seconds, might be relevant here).

Failing that, what is the most idiomatic way of splitting up the commit into two separate commits, one for the A → B rename* and one for the creation of the new A (to at least assist a human reviewer with understanding what happened if they go through commits one by one)?

If you're comfortable with Git, it's easy to split a commit in two. Say you've got

X---ABA---*---Y    topic

and you want to split ABA into a rename-and-recreate sequence.

git checkout X          # no need for a branch name here
git mv A B              # make the AB-rename-only commit
git commit              # ...
git replace --graft ABA @        # make ABA's parent be that locally
git filter-branch -- @..topic    # bake it in

That is the first time I've ever seen `git replace`. Is my understanding correct that `git replace --graft` followed by `git filter-branch` the way you did it is effectively a "dumber" kind of rebase that doesn't try to do any of the "smart" stuff a proper rebase would do to the subsequent commits and just bolts them on top of another commit without any changes instead? That's exactly what I was looking for here (and would've come in handy in similar situations in the past, too). — sh-at-cs, Apr 23 '23 at 01:21
Yup. This kind of thing is right straight up the filter-branch / replace alley. note: the later commits will still be rewritten (with new id's) because of the ancestry change. — jthill, Apr 23 '23 at 01:30

Dealing with a file being renamed while having new content under its old name in Git

1 Answers1