How can I combine a series of partial svn dumps into a single repository?

Question

I'm trying to recover a remote Subversion repository onto my local machine. I do not have direct access to the server to run shell commands, but I do have full svn permissions on the repository itself.

Due to some kind of issue we have yet to identify, neither svnsync nor svndump nor anything else I've tried succeeds when run against the entire repository at once. Sometime during the operation, it will fail with a message like "connection timed out" or "cannot access chunk", or similar messages. We haven't been able to find the source of the problem, it could be a software issue on the server, a corrupt repository, or perhaps just an unreliable network connection. No matter what the issue, the person who controls the server has been very slow to help us resolve the problem, so we're trying to work around it if we can.

I was able to do dumps of the server in batches of revisions. I ran a series of commands similar to these to get partial dumps like this:

svnrdump dump -r0:499 https://server/svn/respository > 0-499.dump
svnrdump dump -r500:999 https://server/svn/respository > 500-999.dump
svnrdump dump -r1000:1499 https://server/svn/respository > 1000-1499.dump

This allowed me to push through the server issues. When a dump timed out or had other issues, I just retried that portion until it worked, or used a smaller increment. Now I have a number of dump files that together represent the entire repository.

My question is: how can I combine these separate dumps into a single local repository?

I've tried doing this with an empty local repository:

svnadmin load repository < 0-499.dump
svnadmin load repository < 500-999.dump

The first command works, but the second one fails. The error message suggests that it's trying to add a file that already exists, and it gives up. I have found that I can do this instead:

svn mkdir batch1
svnadmin load --parent-dir "batch1" repository < 0-499.dump
svn mkdir batch2
svnadmin load --parent-dir "batch2" repository < 500-999.dump

This successfully loads the separate revision batches into separate directories within the repository, but I'm not sure how/if I can then recombine them into a single folder.

I'm also aware that I could use the --incremental switch when creating the dumps, but I'm not sure if that's a good idea since I suspect there may be some corruption in the incremental data (one reason I suspect that is because running svnsync or git svn clone on the repository sometimes errors out with a checksum mismatch)

Can I combine the non-incremental sequential dumps I have into a unified new repository somehow? If not, what other method should I use to do this considering svnsync and svnrdump have never succeeded when run against all revisions at once?

score 5 · Answer 1 · answered Nov 11 '13 at 01:27

You don't mention what version of Subversion you're using but prior to 1.8.3 there was a problem with svnsync and using the serf http library. Versions of Subversion newer than 1.8.0 always use serf for http/https. 1.5.0 - 1.7.x could optionally use that depending on build time and run time configuration. The change we made shows up in the CHANGES file as:

* svnsync: fix high memory usage when running over ra_serf (r1515249 et al)

I believe that this issue would impact svnrdump as well since the fix was to the replay implementation with serf that svnrdump would use as well.

This high memory usage would often result in very odd and random errors. In some cases the resulting swap usage on the machine would result in timeouts and other strange errors.

So first of all try updating to Subversion 1.8.4 (the newer version at current) and see if you can't dump the entire repo now.

Now back to your original question. For doing what you should have been doing you really should be using --incremental on the dumps after the first dump. Your issue with load is entirely because of the lack of using --incremental on these later dumps. Per the output of svnadmin help dump:

If --incremental is passed, the first revision dumped will describe only the paths changed in that revision; otherwise it will describe every path present in the repository as of that revision. (In either case, the second and subsequent revisions, if any, describe only paths changed in those revisions.)

Since you didn't pass --incremental that first revision is including the full tree and not just the changes, thus the conflicts when you try to load it.

Your concerns with the checksum errors you've seen with svnsync shouldn't be any different. --incremental only changes the behavior of the output of the first revision in the range you requested. In fact using --incremental makes the server do less work and is less likely to run into problems since providing the full tree may require it to walk back into revisions it might not need.

There are probably ways of fixing the lack of the use of the --incremental option but you'd essentially have to remove the first revision of each dump. Convert it back to an incremental set of changes and then apply it. Might be able to do that by loading it into a repo and then exporting the tree over a wc checkout of the whole tree, checking it in and then fixing up the revision props (log, author, date, etc) after the fact.

But all of that seems like an awful lot of work when you could just be using --incremental.

Regarding the checksum errors you mentioned. I somewhat wonder if they aren't possibly related to the zlib issues that we've noticed recently. You don't mention what platform you're on but Windows versions of Subversion are usually built with a assembly optimized version of zlib that happen to be buggy. They shouldn't be used but they are. You can find details from this users@subversion.apache.org mailing list post.

If any case if there is repository corruption then you will likely have a very hard time getting a useful dump. You may have to jump through some of the hoops or get help from the administrator of the repository.

My client is Tortoise 1.8.3 linked to Subversion 1.8.4 on Windows 8.1 64-bit Professional. The server is a Linux server, but I don't know what version of Subversion is being run on it. The server is very opaque to us. — Joshua Carmody, Nov 11 '13 at 01:51
I just tried doing incremental dumps but I found there are 4 revisions that svnrdump *will not* dump with the --incremental switch on. It will also freeze and die if it reaches them in the middle of a batch when doing a dump. However, if I start a non-incremental dump at that revision number it works fine. — Joshua Carmody, Nov 11 '13 at 02:12
Any chance this is a public server that you could share? Maybe there's a bug. — Ben Reser, Nov 11 '13 at 03:24
I'm afraid not. The server is run by the IT department of a company affiliated with the one I work for. They're supposed to keep it running for us. Unfortunately, we're not as high priority for them as some of their other projects. — Joshua Carmody, Nov 11 '13 at 05:03

How can I combine a series of partial svn dumps into a single repository?

1 Answers1