Dealing with multiple svn corruptions over multi-year period

Question

To start with, I have already read various postings on dealing with what seem like relatively small SVN corruptions. I am posting this because we are concerned that the sheer number of corruptions implies our methodology is flawed or alternatively our recovery options leading to success are more limited.

We are currently running a VisualSVN 2.1.3 repository (SVN 1.6.12) that is approximately 8 years old with approximately 50,000 revisions running on a windows file share for FSFS backing. Based on VSVN history, it is highly likely that the very early portion of the repository was running on a prior version of VSVN since the earliest revisions predate the currently running version release date.

The repository has many subprojects. People tend to work on a subtree rather than the entire repo.

As part of upgrading some of our internal tools and infrastructure we encountered various industry standard tools reporting the missing trailing newline issue on the repo unless we limited them to the last 1500 or so revisions. It is critical to note, that until we ran these tools there was nothing indicating any issue to the dev team on a day to day basis.

So we started running svnadmin verify which immediately highligted the 3rd commit as invalid, and the 4th, and the 11th and so on. After a while of attempting to manually find the good point, we wrote a script to call svnadmin verify -r X on each individual revision.

In the end it reports that over 1,000 of our commits are non-verifiable (we realize that 1 bad revision will also force the next revision for any of the impacted files to also be non-verifiable, at a minimum doubling the corrpution count). The corruptions are scattered across the bulk of the timeline, from 8 years ago to mostly 6 years ago (there is one bad commit in the last 2 years)

This seems rather high for a repository that for the most part appears to be functioning - is there a flaw in our verification plan?
Assuming not, this seems like a high number, and the only articles we could find on the topic imply big commits could be a problem, but we seem to have a much more frequent issue than big commits.

At this point, we have only completed the verification phase to identify the bad revisions, but are not yet clear on which code is impacted. It may be all the corruptions are on dead projects/branches, or not.

We are looking to see which methodology is recommended based on the number of corruptions/size of repository. All recovery options include upgrading to a newer VSVN version and improving our SVN maintenance practices.

Option A - Incremental Dump/Load/Compare - now that we have identified all the bad revisions we could do incremental dumps skipping the bad ones. We can then load this dump into a new repo, svn diff the two repos and see what doesn't match and patch the remaining differences. We would lose some history, but depending on corruption specifics we could either be manually patching a LOT of files or maybe very few depending on if they are relevant anymore. But a lot of manual work.

Option B - Export/Import - attempt to export the current revision of repo into a new repo, losing all the history. Would still do a compare for sanity but would not anticipate a lot of diffs.

Option C?

Thank you!

Add the errors produced by `svnadmin verify` to your questions, please. Are they all the same? There is a chance that there are much less than 1000 revisions corrupted and the actual output of `svnadmin verify` should help. — bahrep, Mar 11 '17 at 17:44

bahrep · Answer 1 · 2017-03-23T18:05:31.710

Those revisions that produce svnadmin: E160004: Revision file (rREVNUM) lacks trailing newline error are corrupted. I guess that their revision files are completely empty (are they?). If you don't have a backup to recover those revisions from, you'll have to repair the repository.

It is important to verify your repositories regularly and implement a reliable backup. To cover these tasks and help Subversion admins, the newest release - VisualSVN Server 3.6 - introduces a built-in scheduled backup solution and scheduled verification.

Setting up scheduled repository backup and verification for your Subversion repositories is only a matter of minutes. For step-by-step instructions, please see the article KB106: Getting Started with Backup and Restore.

Option A - Incremental Dump/Load/Compare

This approach should work. However, instead of skipping those broken revisions, you should replace them with an empty revision. This step is important to ensure that the revision numbering in the repository won't be shifted as a side-effect of the repair.

BTW, you did check the disks with the repositories for failures, right? If you haven't yet, please run chkdsk or similar tool.

Option B - Export/Import

Try option A first. There is a great chance that you'll replace those broken revisions and successfully repair the repository.

IMPORTANT: VisualSVN Server 2.1.x version family is very outdated and is not supported beginning from September 2013. VisualSVN Server 2.1.x does not receive patch updates or security fixes anymore. We recommend all VisualSVN Server users to update to the latest VisualSVN Server 3.6 release. Read the KB95: Upgrading to VisualSVN Server 3.6 article before upgrading.

Thanks for your response. As far as we can tell, all the errors were the same - that being the E160004 error. An important point is that the corruptions all appear to be at the earlier end of the repository history. At some point we migrated to a new filesystem - based on a NetApp appliance. There do not appear to be any corruptions since that migration. As far as your suggestion to add empty revisions in place of the corrupt ones, can you help me understand how this strategy can effectively recover the repository? I thought that SVN stores incremental diffs, thus making recovery impossible??? — lmgtfy, Mar 14 '17 at 12:22

Dealing with multiple svn corruptions over multi-year period

1 Answers1