52

I have a repository which I have already cloned from Subversion. I've been doing some work in this repository in its Git form and I would hate to lose that structure by cloning again.

However, when I originally cloned the repository, I failed to correctly specify the svn.authors property (or a semantically-similar option).

Is there any way I can specify the SVN author mappings now that the repository is fully Git-ified?

Preferably, I would like to correct all of the old commit authors to represent the Git author rather than the raw SVN username.

Lii
  • 11,553
  • 8
  • 64
  • 88
Daniel Spiewak
  • 54,515
  • 14
  • 108
  • 120

3 Answers3

58

Start out by seeing what you've got to clean up:

git shortlog -s

For each one of those names, create an entry in a script that looks like this (assuming you want all the authors and committers to be the same):

#!/bin/sh

git filter-branch --env-filter '

n=$GIT_AUTHOR_NAME
m=$GIT_AUTHOR_EMAIL

case ${GIT_AUTHOR_NAME} in
        user1) n="User One" ; m="user1@example.com" ;;
        "User Two") n="User Two" ; m="user2@example.com" ;;
esac

export GIT_AUTHOR_NAME="$n"
export GIT_AUTHOR_EMAIL="$m"
export GIT_COMMITTER_NAME="$n"
export GIT_COMMITTER_EMAIL="$m"
'

That's basically the script I used for a large rewrite recently that was very much as you described (except I had large numbers of authors).

edit Use π pointed out a quoting problem in my script. Thanks!

Dustin
  • 89,080
  • 21
  • 111
  • 133
  • 1
    Should be export GIT_AUTHOR_NAME="$n" or only the authors first name will end up in the index! – pi. Feb 19 '09 at 15:07
  • 4
    This script works fine. However, after I had it applied, a call to "git svn rebase" causes the error message: "Unable to determine upstream SVN information from working tree history". – olenz Jul 14 '11 at 09:23
  • How do you then go and push the edited/corrected authors back to the remote? – user1027169 May 20 '12 at 23:41
  • I am afraid to try this because of the comment by @olenz . Anyone else have success with this after `git svn rebase`? – Spencer Williams Aug 30 '16 at 01:41
11

git filter-branch can be used to rewrite large chunks of history.

In this case, you would probably do something like (totally untested):

git filter-branch --env-filter '
    GIT_AUTHOR_NAME=`echo "${GIT_AUTHOR_NAME}" | sed -e "s/svnname1/Right Name/; s/svnname2/Correct Name/"`
    GIT_COMMITTER_NAME=`echo "${GIT_COMMITTER_NAME}" | sed -e "s/svnname1/Right Name/; s/svnname2/Correct Name/"`
    GIT_AUTHOR_EMAIL=`echo "${GIT_AUTHOR_EMAIL}" | sed -e "s/svnname1/m@i.l/; s/svnname2/correct.name@e.mail/"`
    GIT_COMMITTER_EMAIL=`echo "${GIT_COMMITTER_EMAIL}" | sed -e "s/svnname1/m@i.l/; s/svnname2/correct.name@e.mail/"`
'

As always, the following applies: in order to rewrite history, you need a conspiracy.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • Upvoted for: "As always, the following applies: in order to rewrite history, you need a conspiracy." Very well said. (Although the link doesn't load anymore) – Matt D Mar 05 '12 at 19:36
  • You would run into issues with the given regexes if you have an svn name that is a subset of another svn name... This is why god gave us `^` and `$`. – Dan Dec 07 '15 at 09:27
  • after the changes, don't you need to export the GIT_ variables back to the env? – FlipMcF May 15 '17 at 22:17
3

You probably want to look into git-filter-branch, specifically the --commit-filter option. This command is a powerful chainsaw that can rewrite your entire repository history, changing whatever you might want to change.

Note that when you do this, you should pull new clones from the updated repository since the SHA1 hashes of every commit may have changed.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285