You have three options
As far as Git is concerned, there is no such thing as a commit with a file move. A commit is just a snapshot: "This is what's in." That's it: no more, no less. In other VCSes, a new commit B that follows an old commit A is not just a snapshot of "what's in", it's also "what changed", possibly including things like "renamed path/to/file to different/path/to/newname". Git, however, chooses instead to (attempt to) reconstruct what changed, by—later, at the time you are looking at it—comparing the new contents of commit B to the old contents of commit A.
In general, Git steps back one commit at a time: compare Y-and-Z, then compare X-and-Y, then compare W-and-X, and so on. That's what git log
and git blame
do, for instance. Note that I've given the commits single letter names here, and assumed a linear sequence: A--B--C--...--Z
. In practice we need longer IDs, and not all sequences are linear (but with any luck the sequences right near this problem are linear).
What this means for you is that you must convince Git not to compare commit H ("commit that, vs G, has files under new name") to commit G ("commit that when compared to F, deletes files under old name") but rather to compare commit H to commit F, skipping over G. In fact, perhaps we want to skip commit H as well, by comparing commit I (the one after H) to commit F (the one before G). That's less critical than skipping over the commit that has the files deleted.
For all our options we need to know (or find) several of Git's commit IDs. The four "particularly interesting" commits are:
- The commit where "all files are added again": it's H above, but let's call it
addaddaddaddaddaddaddaddaddaddaddaddadda
(which is actually a potentially-valid Git hash ID). You will need to find the real ID.
- The commit where "all files are deleted". This is the parent of the above, so we can name it using the funny suffix-hat (
^
) syntax that Git provides, by writing addaddaddaddaddaddaddaddaddaddaddaddadda^
. But let's just say we have the raw number as de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e
.1
- We may also need to know the commit that comes after
addaddaddaddaddaddaddaddaddaddaddaddadda
. This is the one we called "I" above: as Git is traversing history in reverse, commit goodgoodgoodgoodgoodgoodgoodgoodgoodgood
2 leads Git to reach commit addaddaddaddaddaddaddaddaddaddaddaddadda
, which leads Git to reach de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e
, which of course leads Git to reach ...
- The commit before all the deletes. Again, we can use the hat syntax for this—in fact, knowing the "good" commit ID, we can just use
goodgoodgoodgoodgoodgoodgoodgoodgoodgood^
, then goodgoodgoodgoodgoodgoodgoodgoodgoodgood^^
, then goodgoodgoodgoodgoodgoodgoodgoodgoodgood^^^
, and so on. But I'll just use de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e^
for this one.
Option 1: just tell git blame
to skip the commit
You have several ways to do this, but for git blame
in particular, you have one option that is not directly available in other Git commands:
-S <revs-file>
Use revisions from revs-file instead of calling git-rev-list(1).
The documentation for this option is poor (in my opinion): the -S
file argument is not a revision list, but rather a graft list.
What this means is that instead of git blame <path>
, you can run:
echo addaddaddaddaddaddaddaddaddaddaddaddadda \
$(git rev-parse de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e^) > \
/tmp/graft
git blame -S /tmp/graft file-you-are-concerned-with
(or similar, depending on your OS). See below for additional tricks, since you might want to skip the "add" commit too. Of course the two raw commit IDs here need to be the right ones.
(If you have the raw ID of the commit before the "delete" commit, you can use that instead of invoking git rev-parse
. The nice thing about invoking rev-parse
is that you can use abbreviated commits and thus get the full ones, plus of course all the usual gitrevisions syntax. The "echo" is to make sure both IDs are on the same line, as the -S
file is handled the same way as the old Git grafts hack.)
Option 2: hide the commit more generally
If you want to hide the commit from most Git commands, you can do that more permanently in one repository (in a way that does not propagate elsewhere) using git replace
:
git replace --graft \
addaddaddaddaddaddaddaddaddaddaddaddadda \
de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e^
What we we are doing here is telling Git that whenever it's about to look at commit addaddaddaddaddaddaddaddaddaddaddaddadda
it should turn its eyes3 instead over to a new "replacement" commit. The git replace
command makes the new replacement commit by mostly copying addaddaddaddaddaddaddaddaddaddaddaddadda
, but changing its parent from de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e
to de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e^
, i.e., the commit that came just before the "delete things" commit.
Option 3: really delete the commit(s)
It is possible to discard one or even both intermediate commits. Let's say, for instance, we've decided to discard both addaddaddaddaddaddaddaddaddaddaddaddadda
and its previous de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e
. The drawback is that this effectively "re-numbers" every commit after that point: every commit starting from goodgoodgoodgoodgoodgoodgoodgoodgoodgood
forward. The new, rewritten repository is no longer compatible with the old repository (and if you did your SVN-to-Git conversion with "notes" attached to each commit to remember the corresponding SVN revision, this process wrecks the notes).
To discard the two commits, start with the same the git replace
thing as before. This time, however, we want to replace goodgoodgoodgoodgoodgoodgoodgoodgoodgood
itself, with a copy that is just like goodgoodgoodgoodgoodgoodgoodgoodgoodgood
, except that its parent is the parent of de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e
. Hence:
git replace --graft goodgoodgoodgoodgoodgoodgoodgoodgoodgood \
de1e7ede1e7ede1e7ede1e7ede1e7ede1e7ede1e^
Using our simple single-letter drawing again, what we've done is this:
-------I' <-- replacement for I
/
A--...--E--F--G--H--I--J--...--Z <-- HEAD
The graft makes Git jump from I
to I'
by "moving its eyes", so that it never sees H
(the re-adds) nor G
(the deletes) and jumps directly back to F
.
Now that we have the graft in place, we can run git filter-branch --tag-name-filter cat --all
. This obeys the graft while copying every commit it sees to new commits.4 The copies "before" the replacement I'
are bit-for-bit identical to their originals, so they retain their original hash IDs. The copy of I'
remains I'
, but the copies after I'
are different, so they get new IDs.
Once the filtering is done, the filter-branch
command replaces the old branch and tag names with new branch and tag names pointing to the new copies. (The new tag names are the same as the old tag names, because our tag name filter was cat
.)
1It's the Cyberman commit! You will be upgraded, or deleted!
2This is not a valid commit ID but there is a limit to what we can spell with [0-9a-f]
. :-)
3Does Git even have eyes, or am I anthropomorphizing computers again?5
4While the identifying of commits is always done "backwards", from newest commits back to oldest, the copying that git filter-branch
is (necessarily) done "forwards". The way filter-branch works is to copy every commit, with the new copy made after applying any filters. This is why it is so slow. In our case we're doing the copy simply for its side effect of making replacements become permanent.
5"Don't anthropomorphize computers, they hate that." —author unknown