Migrating a large, divergent TFS Team Project to Git

Question

I have a large-ish TFS team project.

After battling with Git-TFS (we have some funky stuff in our TFS Team Project) I have a full local git repo.

It is too big to fit into the BitBucket 1GB soft limit.

The Team project contains branches that are divergent products.

-- Base Product (trunk)
--- Client A Product (from trunk)
--- Client B Product (from trunk)
---- Client B Feature Branch (from B)
--- Client C Project (from trunk)
---- Client D Project (from C)
----- Client E Project (from D)

As you can see, we haven't been kind to ourselves when branching in TFS.

Doing a shallow clone show a single commit for any branch is about 150-200MB. Full history for any given branch is just under 1GB

I am proposing doing a git repo per branch, and just pushing the branch history since the branch commit. This would mean that no branch has a common ancestor, forcing baseless merges when wanting to do cross-TFS branch merging. I am also proposing to store a read only full historical repo by doing an aggressive GC and removal of some big objects which allows me to squeeze the whole lot into a single repo. This at least opens up the posibility of doing a graft or a replace+rebase to join the 'current' repo's with the historical one at some point in the future.

I cannot cleanly cut history (and rebase) at any point to provide sensible common ancestry and repo headroom under the 1GB limit.

Can anyone help with a better migration strategy?

UPDATE 1: the sub-text to this question is... When products diverge, how important is the branch structure. A significant issue we have is the merge commit relationships between branches. If I trim history it also forces me to dispose of the merge commit history in some cases (because we have done bonkers merges from early portions of one branch to late portions of another)

UPDATE 2: I have another strategy that dispenses with all of the merge history but retains the original parent branch ancestry. Git TFS quick clone with the -c option to create a start point at the desired point in time. Git TFS pull --rebase --all Then init a descending branch Git TFS branch --init [branch name] Then pull again etc

This gives the common ancestry and dispenses of merge commit history which allows for a smaller repo, but at the expense of the merge history.

You should be able to push your Git repo into a Team Project configured for Git in VSO! — MrHinsh - Martin Hinshelwood, Mar 05 '15 at 05:17
@MrHinsh thanks but that's the wrong way around. I am migrating from TFS to Git. — trickbooter, Mar 05 '15 at 10:47
That's what I said. TFS is not a source control system, so I assume that you are wanting to move from TFVC to Git. Git as provided in VSO does not have the same limitations as other Git repository's that are not designed for enterprise scale. — MrHinsh - Martin Hinshelwood, Mar 05 '15 at 11:57
@MrHinsh ahaa, I see what you are getting at now. Apologies. We are using BitBucket as our corporate choice, some depts are already over. It is unlikely that I can convince anyone of another store. — trickbooter, Mar 06 '15 at 06:22

Philippe · Accepted Answer · 2015-03-06T10:02:40.797

1

It's difficult to answer without knowing what you have in your repository and what you want to keep.

But if you have a working directory of about 100-200MB, you must have a lot of binaries.

I'm sure that the first step is to remove all the binaries you can using the very good and easy to use tool, bfg report cleaner

Then, you will see if the size of your repository is still a problem.

Ps: keep a backup of your repository before rewriting history. At least, it will be your read only reference repository in the end if you need so...

edit: in fact, I googled about this limit on bitbucket and find this page with 2 very interesting links: How to handle big repositories with git and Reduce repository size

edited Mar 06 '15 at 10:02

answered Mar 04 '15 at 08:43

Philippe

28,207
6
54
78

There are 40mb of binaries. There are about 20mb of www assets (images and the like). There is about 50Mb of test data for db unit testing (this is the likely candidate for rework). – trickbooter Mar 05 '15 at 10:49
I have marked this as an answer after further experimentation. The situation that arises when doing merges between branches that have lost their merge history is pretty hard to deal with, and will stick around for a long time (until each file has seen a merge). As such I think that BFG and using external sotres like S3 are probably the best answer for my needs. Many thanks – trickbooter Mar 06 '15 at 06:23

Migrating a large, divergent TFS Team Project to Git

1 Answers1