6

Our SVN repository is approaching 0.5 GB. We have nowhere near that amount of code in our production system.

Is it possible to remove old revisions. I tried svn dump with a beginning revision number, but to no avail. I couldn't import that into a clean SVN repository.

We don't need the history over a year old.

Any ideas?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Wes
  • 6,697
  • 6
  • 34
  • 59
  • 4
    Have you considered buying a larger harddisk? 1 **terabyte** disks are affordable for most companies these days. – Mark Byers Dec 03 '10 at 21:50
  • 1
    read this http://stackoverflow.com/questions/681279/whats-the-best-way-to-clean-up-a-subversion-repository – Ish Dec 03 '10 at 21:51
  • 1
    Not sure about this, but doesn't SVN slow down as it has to run through all the revs to figure out what the HEAD files actually are? @Mark: Usually the cost of everything is at least double or triple, as redundancy is desired, then the cost of backup space. – Nick T Dec 03 '10 at 21:53
  • 2
    @Nick: No, SVN does not slow down if a file has more revisions. (The SVN project has been self-hosting for a _very_ long time, so it must be the oldest SVN repo around. If many older revision would be annoying, the SVN developers themselves would be the first to notice.) If you waste 5hrs on this, you have the cost of a 1TB disk, including copying data and swapping the physical disks, and enough room to wiggle in a 0.5TB external backup disk. – sbi Dec 03 '10 at 22:00
  • @Wes - could you specify how large a clean checkout of each 'trunk' (depending on if you use /trunk/projects or /projects/trunk repo layout) is? It may be that there is not much savings to be had, and the problem is more that large files are being committed that should not be. – Joshua McKinnon Dec 03 '10 at 22:40
  • @joshua your right I'll check this out when at work. – Wes Dec 03 '10 at 22:50
  • 1
    @Wes also keep in mind the space of a checkout is 2 copies of the files (one in .svn folders, one actual), and that svn repository itself is highly compressed - still, it will give you a ballpark number, and you can compare it to the size of a full checkout a year ago and look for large files... – Joshua McKinnon Dec 03 '10 at 23:00
  • Also, SVN doesn't have to run through all revs to find HEAD. It saves a full copy once in a while (not sure how often). Kind of comparable to key frames in video compression – Sander Rijken Dec 04 '10 at 12:23
  • @Peter Mortensen Thanks for the edit but I'm sure that this question is really not relevant to people now. A disk space is a lot cheaper then it was in those days. 2 not a lot of people use SVN now (Thats a fact I just made up). 3 there is decient cloud hosting specifically for source control. – Wes Dec 07 '17 at 16:21

2 Answers2

19

You can remove or better "shrink" the history of your SVN repository. Say you have 1000 revisions and you want to shrink to only have the revisions from r950-r1000. You can do the following:

svnadmin dump /path/to/current/repo -r950:1000 > small_svn.dump
svnadmin create /path/to/new/repo
svnadmin load /path/to/new/repo < small_svn.dump

However, there are two caveat to see:

1st: all your tags and branches will end up as standalone copies and so will take much more space than before (this could end in an even bigger repository, you have to try) - you can use svndumpfilter to remove tags and branches, however, than you need the old repository to get information of these tags/branches.

2nd: If your branches stay in your new repository, all mergeinfo will show wrong revisions as your new repository starts with revision 0 again and also all branches are gone in version history (due to pt. 1)

A much better solution:

  • Find the revision(s) which are responsible for the growth of your repository(search for large files in your repository datastorage usually located under: /path/to/repo/db/revs/[0...X]).
  • Check the log history of these revisions and locate the files which are responsible.
  • If you do not need these files, remove them via svndumpfilter.
  • Teach your user how to avoid committing unnecessary, large files.

Otherwise you will have to shrink your repository in several weeks again!

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Peter Parker
  • 29,093
  • 5
  • 52
  • 80
-5

How can your business afford to waste time on this instead of just buying a bigger disk, moving the stuff over, and get on? 1TB costs the equivalent of 1-2 man-hours plus the time needed to move the data and swap the disks.

sbi
  • 219,715
  • 46
  • 258
  • 445
  • 3
    Well I wanted to back it up each night – Wes Dec 03 '10 at 22:07
  • @Wes: 366 * 0.5 Gig < 1 Terabyte. – Mark Byers Dec 03 '10 at 22:09
  • 5
    First the cost of a disk is much more than the cost of a disk. Remote hosting. Secondly I struggle to get them to pay £30 on a book. Bandwidth isn't free either, and the time to transfer 1/2 a gig every time isn't trivial either. Oh and the rate of growth is crazy. We had most of the work done in the first year and the size of the repo was 80megs 1 year later its arround 500 – Wes Dec 03 '10 at 22:22
  • 2
    Supposedly, remote hosting is done because it's _cheaper_ than hosting yourself? Then how could "it's more expensive because we're remote-hosting" ever be a valid argument? – sbi Dec 03 '10 at 22:34
  • And as for £30 for a book: I used to have a boss who, when asked to buy a certain book would ask back "Does this have a chance to save you X hours?" with the book's price/X being my monthly rate. When answered with "Yes", he'd buy the book. (I never had to answer with "No", but I suppose that, had I done so, he'd probably told me to close his office's door and sit down, so he could have a talk with me to find out why I come bothering him about a book that I don't think is worth its money. `:)` To bad I had to leave there.) – sbi Dec 03 '10 at 22:35
  • Nope its not cheaper than hosting it ourselves not by a looooong shot. Cost isn't the reason for the hosting. – Wes Dec 03 '10 at 22:52
  • Just out of curiosity, what exactly are you storing on source control that takes up that much space? – Goran Jovic Dec 03 '10 at 23:48
  • A [differential backup](http://en.wikipedia.org/wiki/Incremental_backup#Differential) scheme might be better performing a full backup every night. – ldav1s Dec 04 '10 at 00:02
  • We use SVN to keep some big images binary file on cloud SVN. How can we buy a disk for cloud service? It have cost per GB/user. by the way we don't need more than 10 revision, why we should pay for 10GB old revision file per month/user!!! we should find a way to delete the old revision. – Mohammad Nikravan Aug 24 '12 at 18:16
  • 10
    Wes has a clear question about the possibilities to drop old svn revisions. How does it matter whether it's 1/2GB or 1/2TB... – alfonx Apr 08 '14 at 08:09
  • @alfonx: I generally prefer to point out what I consider an erroneous approach rather than answer the question it led to. Sometimes this triggers heated rejections like yours, and sometimes enthusiastically thankful replies by the OP. _Shrug_. – sbi Apr 08 '14 at 10:25