41

I have a big repository, 100,000+ revisions with a very high branching factor. The initial fetch of the full SVN repository using git-svn has been running for around 2 months and it's only up to revision 60,000. Is there any way to speed this thing up?

I'm already regularly killing and restarting the fetch due to git-svn leaking memory like a sieve. The transfer is occurring over the local LAN, so link speed shouldn't be an issue. The repository is on a dedicated machine backed by dedicated fiber channel arrays so the server should have plenty of oomph. The only other thing that I can think of is do the clone from a local copy of the SVN repository.

What have other people done in similar circumstances?

MrEvil
  • 7,785
  • 7
  • 36
  • 36
  • "I'm already regularly killing and restarting the fetch due to git-svn leaking memory like a sieve" -- just a wild guess here, but a `git gc` and `git svn gc` from time to time might be helpful too. – Tyler Dec 09 '10 at 09:30

8 Answers8

25

At work I use git-svn against a ~170000 revision SVN repo. What I did was use git-svn init + git-svn fetch -r... to limit my initial fetch to a reasonable number of revisions. You must be careful to choose a revision that is actually in the branch you want. Everything is fully functional even with truncated history except git-blame, which obviously attributes all the lines older than your starting rev to the first rev.

You can further speed this up with ignore-paths to prune out subtrees that you don't want.

You can add more revisions later, but it will be painful. You will have to reset the rev-map (sadly I even wrote git-svn reset and I can't say offhand if it will remove all revisions, so it may be by hand). Then git-svn fetch more revisions and git-filter-branch to reparent your old root to the new tree. That will rewrite every commit but it won't affect the source blobs themselves. You have to do similar surgery when people undertake big reorgs of the svn repo.

If you actually need all of the revisions (for example for a migration) then you should be looking at some flavor of svn-fast-export + git-fast-import. There may be one that adds rev tags to match git-svn, in which case you could fast-import and then just graft in the svn remote. Even if the existing svn-fast-export options don't have that feature, you can probably add it before your original clone completes!

Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
14

Apparently there is no good answer. Some work is being done on git-fast-import but it isn't ready for prime time yet. They are still trying to figure out how to detect and represent 'svn cp' actions. The one bright spot is that someone on the list came up with an optimization for git-svn that seems to have made a big impact.

http://permalink.gmane.org/gmane.comp.version-control.git/168718

MrEvil
  • 7,785
  • 7
  • 36
  • 36
6

If you can find a server with enough RAM, do the whole clone operation on a ramdisk. On Linux systems you can use /dev/shm, which is backed by RAM.

> svnadmin hotcopy /path/to/svn/repo /dev/shm/svn-repo

> git svn clone file:///dev/shm/svn-repo /dev/shm/git-repo

Once that's done, you can point the git repo back to your real svn repo instead as described here: https://git.wiki.kernel.org/index.php/GitSvnSwitch

  • Edit the svn-remote url URL in .git/config to point to the new domain name
  • Run git svn fetch - This needs to fetch at least one new revision from svn!
  • Change svn-remote url back to the original url
  • Run git svn rebase -l to do a local rebase (with the changes that came in with the last fetch operation)
  • Change svn-remote url back to the new url
  • Run git svn rebase should now work again!

This will only work, if the git svn fetch step actually fetches anything! (Took me a while to discover that... I had to put in a dummy revision to our svn repository to make it happen!)

I just did this and was able to clone a 4.7G 12000 revision svn repo to git in about 3 hours.

Community
  • 1
  • 1
bengineerd
  • 1,268
  • 1
  • 16
  • 18
5

In a repository with 20k commits I had similar problems. In my case it turned out that there was a few strange tags in subversion that caused problems. There was tags that copied / instead of /trunk. That cause git svn fetch to go into infinite loop. I fixed it by converting in chunks.

git svn fetch -r0:1000
git svn fetch -r0:2000
git svn fetch -r0:3000

Watch the output and if you don't see new r... once in a while then something is wrong. Use git log --all to see how far the conversion got. Let say you got to 1565. Then continue the fetch like this.

git svn fetch -r1567:2000

It was very tedious but it got the job done.

Tobias Tobiasen
  • 988
  • 11
  • 12
  • This was quite helpful. I'll point out that if you run one of the `-r0:1000` and you don't see any output at all, it appears to have already done that section. Run `git log --all` and start from a later SVN commit. Hasn't finished a checkout yet, but I'm hoping it all went well. :) – Louis St-Amour Mar 21 '16 at 18:14
3

I have a repo with 8k+ reviews and around 240 tags. I tried to run and estimated that my intial git svn clone on windows would have taken months, simply doing

git svn clone --stdlayout --no-metadata --authors-file=users.txt https://link.to.repo

The clone was was taking 5 seconds to import 1 revision on average. Please notice that whenever a tag is encountered, the clone restarts from rev 1, so potentially there are 8k * 240 operations = 111 days

Summary of my all the steps I took to speed up the process:

  1. linux and osx implementation are much faster than cygwin on windows. I used a linux virtual machine. Please check https://stackoverflow.com/a/21599759/1448276

  2. I copied the entire svn repo to my machine with svnrdump

svnrdump dump https://link.to.repo > repos.dump

  1. I created a local svn repo

    svnadmin create svnrepo

    svnadmin load svnrepo < repos.dump

as in https://stackoverflow.com/a/10407464/1448276

  1. I created and mounted a ram based disk

    svnadmin hotcopy svnrepo/ /dev/shm/svnrepo

as above, https://stackoverflow.com/a/39030862/1448276

  1. And finally ran the clone

    git svn clone --stdlayout --no-metadata --prefix=origin/ --authors-file=users.txt file:///dev/shm/svnrepo

Here the clone is processing on average 12.5 revisions per second, so I expect it will take less than 2 days. I'll post an update once the clone is complete.

wollow
  • 67
  • 2
  • 7
  • It took less than two days indeed. But then I had to do it again, and this time I used svn2git (I am referring to this: https://github.com/svn-all-fast-export/svn2git). Done in 5 minutes :) – wollow Oct 12 '17 at 19:52
1

I think you are on the right track

Local file access could give you 1 to 2 order speedup.

Not sure if running git svn against a bdb or files based svn backend would be faster.

kevpie
  • 25,206
  • 2
  • 24
  • 28
1

I've downloaded a close-to-100,000-revision SVN repository using git-svn before. It took around 48 hours and was not over a local LAN. Admittedly, you did say that your repository has a high branching factor, while the repository I downloaded did not (although it did have several dozen branches)

I'd suggest working on figuring out where the bottleneck lies. Are git-svn and its subprocesses using 100% CPU? Are the disk lights on the client or the SVN server constantly lit? How much bandwidth is being used? Once you know what the limiting factor is, you can work on figuring out how to fix it.

Daniel Stutzbach
  • 74,198
  • 17
  • 88
  • 77
  • 2
    We have at least several hundred branches, and whenever git-svn encounters a branch it wants to replay the entire history r0-rwhatever. – MrEvil Oct 13 '10 at 20:09
  • 1
    @MrEvil: After some digging with Google, it sounds like that was a problem in older versions of Git, but it shouldn't reply the entire history for each branch in the latest version. I haven't verified that myself. Which version are you running? – Daniel Stutzbach Oct 13 '10 at 21:52
  • 1
    1.7.0.3. I'm making a local mirror of my SVN repository right now usig svnsync. I've only been at it for about 4 hours and I'm already at the 60k mark. I'm going to try: http://github.com/barrbrain/svn-dump-fast-export – MrEvil Oct 13 '10 at 22:37
  • could you provide the URL to the article you dug out? I'm interested to know which previous versions had this problem - I have 1.7.1 and the git-svn fetch is blazingly slow. – Aleksander Adamowski Feb 11 '11 at 15:06
1

2017 calling in. I'm migrating a 45k revision repo and I'm finding git-svn on Linux working about 10x faster than git-svn on my windows box. The Vm is on the same HyperV as my svn repo so it could be that.

timB33
  • 1,977
  • 16
  • 33