1

We accepted a pull request from a user with characters in their author name that caused problems in some git tools. In an attempt to fix this, we followed the instructions here to revise the commit history: https://help.github.com/articles/changing-author-info/

Now most of our commits are duplicated: https://github.com/CreateJS/EaselJS/commits/master/

I've found potential solutions, such as: git filter-branch duplicated commits

However, I'm not seeing another branch, just duplicate commits, and I made another commit before we caught this. I want to be really sure we don't mess this up any worse. Any help is appreciated

Community
  • 1
  • 1
gskinner
  • 2,478
  • 11
  • 12

1 Answers1

3

It was not clear from the direct link to github but I cloned the repo and this made it more obvious:

  • You did the filter-branch itself as per instructions. This made a whole bunch of new commits.

  • But then you (it looks like you) merged the old chain of commits with the new chain of commits (plus one additional commit, "Ticker clean up and bug fixes"), instead of force-overwriting the old repository to discard all the old commits.

If you view them by date, this makes it look like all the commits are duplicated (because they are, below the point at which you merged the two chains).

If you view them by graph topology order, you can see more easily how this came about (note, this is truncated on the right to fit a window):

*   f721f0c (HEAD, origin/master, origin/HEAD, master) Merge branch 'master' of 
|\  
| * 97ffb07 Documentation updates.
| * 3e777f2 Update docs and VERSIONS.
| * 1f32407 Swapped append/prepend naming.
| * dadd1c9 Fixed example in Graphics.append() docs.
| :
| :      [massive snippage, graph modified to show connections]
| :
| *   d0d7f36 Merge pull request #165 from julianklotz/master
| |\  
| | * e980526 Fix documentation Bug: CSS font attribute
| |/  
* | 4c4fe1a Ticker clean up and bug fixes.
* | 6738d23 Documentation updates.
* | 4fe4e97 Update docs and VERSIONS.
* | c17aa05 Swapped append/prepend naming.
* | 36dadb6 Fixed example in Graphics.append() docs.
: |
: |     [more snippage]
: |
* | 0960c56 Added Touch.disable() method. Fixed a very rare issue where Touch co
* |   c20ae9d Merge pull request #165 from julianklotz/master
|\ \  
| |/  
|/|   
| * 26eb5dd Fix documentation Bug: CSS font attribute
|/  
* a21f210 Added SpriteSheetBuilder demo with MovieClip source.
* 20d5dc7 Improvements to resolving mouse position on stage. Should now support 
:      [yet more snippage]

What you need to remember here is that filter-branch, like everything in git, does not change any existing commits, it only adds new commits. When the filter(s) make some change, the filter-branch script makes a new (different) commit that is like the old commit, but has that change applied. So a21f210 and below are unchanged but 26eb5dd is different from e980526: they have the same tree and same parent-ID (a21f210), but different commit message texts: one ends with a newline, the other does not.

Once some commit is different, every descendent of that commit—every commit "after" it in the graph, i.e., every child of that commit—must also be different, because each commit contains the ID(s) of its parent(s), as part of their own ID. (The ID of any git object is a cryptographic checksum of the contents of that object, and the contents of a commit include the parent IDs.)

If/when you push such a filtered result to a shared repository like github, every user of that shared repository must re-adjust their copy based on the filtering. Assuming 6738d23 Documentation updates. is what was in the shared repository before the filtering, everyone else has 6738d23, but you (in your push) create 97ffb07 Documentation updates. and everyone else must take any commits they have that are descendents of 6738d23 and rebase them onto 97ffb07. That includes whoever had 4c4fe1a Ticker clean up and bug fixes. (which is apparently you). Then they must stop using 6738d23 and start using instead 97ffb07.

If the assumption above is wrong—if 97ffb07 is what was in the shared repo before filtering, and 6738d23 is the new one, then everyone must stop using the 97ffb07 version and start using the 6738d23 one instead.

This "everyone must stop using A and start using B" thing is what makes rewriting history so painful, and to be avoided when possible. If someone misses the "stop using A, start using B" instructions, they are likely to create a merge—as you did—that merges the old "A and all its parents" history back in, making it seem (and be) a duplicate of all the new "B and all its parents" history.

To fix this, you must get rid of the merge, i.e., re-write history again.

torek
  • 448,244
  • 59
  • 642
  • 775
  • 2
    `To fix this, you must get rid of the merge, i.e., re-write history again.` But how ? What exactly needs to be done. I keep finding questions like these and the answers say what you should have done, but that doesn't help the case as it has already been done. – Douglas Gaskell Sep 29 '17 at 22:06
  • The "how" part can be done in multiple different ways. The most important thing to realize is that *once you have accomplished the task, it's trivially easy to undo all your work and make a bigger mess* because of the nature of what you are doing, i.e., rewriting history by copying. Git "wants" to add new commits to its collection. Think of Git like the Borg from Star Trek TNG: it will just say "Oh these look new, I'll add them to my collection!" Suppose, for instance, that you're working with another guy named Fred, and you both have a Git repo (you have one, he has one) and you share work... – torek Sep 29 '17 at 23:24
  • ... and you do a `git filter-branch` in your repo. This copies your commits, making some changes during the copying process (these are the "filters" in "filter-branch"). Then Fred's Git connects to your Git and thinks: *Wow, look at all those shiny new commits, I'll add them to my collection!* Now Fred has double the commits, and you connect your Git to Fred's Git and your Git says: *Wow, look at all those shiny new commits, I'll add them!* Now both of you have double the commits. So "rewriting history" is hard: you must make sure *everyone* switches to the new commits. @DouglasGaskell – torek Sep 29 '17 at 23:26