3

I have a repo on an old svn server running on RHL9. svn version is 1.1.4. The repo is 1.1GB large (du -sh $REPO), its full dump is 1.7GB large. I load the dump on a recent svn server running on ubuntu 16.04, svn version 1.9.3. I run the following command:

svnadmin load --bypass-prop-validation -q "/path/to/repo.svn" < "/path/to/repo.dump"

Now, the repo is only 412MB large (du -sh).

I only administrate the server, I don't use svn myself. When i look to the repo logs on Tortoise, it seems that all revisions and all documents are here (can't check all manually, 3733 revs).

But I don't know how to check if this size difference resulted in data loss.

How can I know ? (from tortoise or server cli, I'm root) Is this size difference chocking to you ?

jps
  • 33
  • 3

2 Answers2

0

Spot check if the repo looks sane from HEAD, the last couple significant commits, and maybe way back in the beginning. Only go back as far as necessary for your needs: development, support, and maybe compliance.

Take a backup of the dump and archive it as long as you feel you need to.

To quantify the gains you got from getting rid of per revision files, try a svnadmin pack on a copy of the original repo.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • I cannot do a `svnadmin pack` for the following reasons: old server does not support `svnadmin pack` (svn version 1.1.4) ; on the new server (svn version 1.9.4), when I try `svnadmin pack` after upgrading the repo copy, I get this error : `svnadmin: E125006: 'REPO/db/format' specifies logical addressing for a non-sharded repository`. I tried `file://`, absolute path, same error. – jps Apr 02 '19 at 08:23
  • I did a checkout of the last revision from both server, then did a `diff -r`. Only the `REPO/.svn/wc.db` file changed, which is expected. The fun thing is, both checkouts weight 6.6GB. I pulled 6.6GB of data from a 412MB repo. How the hell is that possible ? – jps Apr 02 '19 at 08:29
  • You can forget the pack idea, that was merely an experiment to see if the many files of previous formats was contributing to a larger space consumption. – John Mahowald Apr 02 '19 at 10:41
  • A working copy is much bigger because it is a different structure than the repo. Notably, the .svn admin area. http://svnbook.red-bean.com/en/1.8/svn.basic.in-action.html#svn.basic.in-action.wc FSFS repo is compressed and packed. Its format has changed a few times over those versions, so it is difficult to quantify exactly what improved its space consumption. Or if you would get a benefit simply from dumping and loading on an older version. – John Mahowald Apr 02 '19 at 10:44
  • 1
    Well, I pulled revisions 3733 (youngest), 3730, 3726, 1836 and 5 (which are relevant young commits, random old commit and the first big commit) from both servers. They are same weight, and `diff -r` reports that they are strictly identical except for file `$CHECKOUT/.svn/wc.db`. I guess svn did a master work of compression and packing from version 1.1.4 to 1.9.3... – jps Apr 02 '19 at 14:03
0

SVN has done a lot to reduce repository size after SVN 1.4 (I think) so you see the bundled results of these development efforts now (SVN original source).

  1. A new repository does not contain old dead transitions (which were not removed in SVN 1.1)
  2. older SVN repos did not store the contents of a file in a compressed form
  3. deltafication of modifications are also now stored in compressed form which saves on later revision files
  4. SVN introduced representation sharing meaning two equal files are stored only once. This can be optimized through svnadmin pack command (which packs 1000 revs into a single file and de-duplicating all contents)

Modern algorithms can compress a lenghtly text to about 15% or less. See here for more numbers and data on compression.

Peter Parker
  • 208
  • 1
  • 2
  • 7