To start with, I have already read various postings on dealing with what seem like relatively small SVN corruptions. I am posting this because we are concerned that the sheer number of corruptions implies our methodology is flawed or alternatively our recovery options leading to success are more limited.
We are currently running a VisualSVN 2.1.3 repository (SVN 1.6.12) that is approximately 8 years old with approximately 50,000 revisions running on a windows file share for FSFS backing. Based on VSVN history, it is highly likely that the very early portion of the repository was running on a prior version of VSVN since the earliest revisions predate the currently running version release date.
The repository has many subprojects. People tend to work on a subtree rather than the entire repo.
As part of upgrading some of our internal tools and infrastructure we encountered various industry standard tools reporting the missing trailing newline issue on the repo unless we limited them to the last 1500 or so revisions. It is critical to note, that until we ran these tools there was nothing indicating any issue to the dev team on a day to day basis.
So we started running svnadmin verify which immediately highligted the 3rd commit as invalid, and the 4th, and the 11th and so on. After a while of attempting to manually find the good point, we wrote a script to call svnadmin verify -r X on each individual revision.
In the end it reports that over 1,000 of our commits are non-verifiable (we realize that 1 bad revision will also force the next revision for any of the impacted files to also be non-verifiable, at a minimum doubling the corrpution count). The corruptions are scattered across the bulk of the timeline, from 8 years ago to mostly 6 years ago (there is one bad commit in the last 2 years)
- This seems rather high for a repository that for the most part appears to be functioning - is there a flaw in our verification plan?
- Assuming not, this seems like a high number, and the only articles we could find on the topic imply big commits could be a problem, but we seem to have a much more frequent issue than big commits.
At this point, we have only completed the verification phase to identify the bad revisions, but are not yet clear on which code is impacted. It may be all the corruptions are on dead projects/branches, or not.
We are looking to see which methodology is recommended based on the number of corruptions/size of repository. All recovery options include upgrading to a newer VSVN version and improving our SVN maintenance practices.
Option A - Incremental Dump/Load/Compare - now that we have identified all the bad revisions we could do incremental dumps skipping the bad ones. We can then load this dump into a new repo, svn diff the two repos and see what doesn't match and patch the remaining differences. We would lose some history, but depending on corruption specifics we could either be manually patching a LOT of files or maybe very few depending on if they are relevant anymore. But a lot of manual work.
Option B - Export/Import - attempt to export the current revision of repo into a new repo, losing all the history. Would still do a compare for sanity but would not anticipate a lot of diffs.
Option C?
Thank you!