3

Recently we completed a task to move 1000s of static PDFs that used to be stored in eXist-db to storage in Amazon S3. All these files are now retrieved through an application that points to Amazon S3 and not the database, the database is used to build out the information to be retrieved.

So we deleted all these files from the database. We deleted files that would represent about 60% (probably closer to 80%) of the size of the overall database.

Because the actual DB is replicated to multiple countries daily, we were hoping this would reduce the size of all the database files considerably.

There is no change. Is this expected? Or what steps need to be performed to actually reclaim this space?

We tried, stopping and starting ... we also tried backup (the now empty collections) and restoring just those collections thinking this would trigger things. Neither worked.

Is there (ever) any way to reclaim this? Do we actually have to backup the entire DB and restore into something clean?

Update I

OK, looking at the directory structures of two different installations that are nearly identical, except for the PDFs ...

With PDFs deleted is actually larger by 10MB than the one with PDFs deleted. In examining the /fs directory the PDFs and gone and summing the total of the collection PDFs that were removed is about 800MB.

So we removed 800MB from the database (the /fs directory is 800MB smaller). But the overall size is increased by 10MB.

The speculation above is incorrect as the overall directory is about 2.4GB so 800MB is not 60%, but smaller.

But still, I would expect that if I removed 800MB of data from the database, some reduction in size would occur and certainly not an increase of 10MB.

Update II

So since the collections deleted had no content, I created a simple XML file and dropped that into each collection. I did a backup of those two collections and a restore with the (now) almost empty collections.

That did nothing.

So it seems that (guessing) only a whole DB backup/restore will do anything.

Kevin Brown
  • 8,805
  • 2
  • 20
  • 38
  • 1
    Hi Kevin, [this thread](https://sourceforge.net/p/exist/mailman/message/32758242/) indicates that there is no mechanism for reclaiming space. Apparently it hangs around until reused. To free it you'd have to *backup, stop, delete the storage from webapp/WEB-INF/data, start and then restore the database.* – kjhughes Feb 11 '19 at 23:25
  • @kjhughes, see above update II for a comment to that. I believe I agree with that. – Kevin Brown Feb 11 '19 at 23:36
  • Or I guess ... maybe? eventually as things change in the database that space will be reused? Really not comforting. The good news is next step is upgrading to the latest eXist-db, so we will be doing a full backup and restore, we just need to make sure I guess that the backup made has the PDFs deleted or we can never remove the space (yuck). – Kevin Brown Feb 11 '19 at 23:43
  • I can understand it though, like say a Microsoft Outlook db. I would have hoped for an application like they have though like their "compress" ... that would reorganize all and cormpress the size taken in a batch operation. I assume that there is no such thing in eXist then. – Kevin Brown Feb 11 '19 at 23:46
  • 1
    Yes, that's the impression I get from your experience and the thread I cited. eXist-db, once given the space, considers the space its to manage as it sees fit. Deletions are marked as such but the storage is reserved for future use. You could test this theory by adding a bunch of synthetic data and observing no db size growth while it's reusing the space. – kjhughes Feb 11 '19 at 23:49
  • 1
    @kjhughes is correct. eXist-db does not reclaim space in the .dbx files. You will need a full backup, stop the server, remove the contents of the data folder, start the server and restore your backup. – adamretter Feb 12 '19 at 10:37
  • @kjhughes post as an answer so I can accept – Kevin Brown Feb 12 '19 at 17:28
  • I'll defer to @adamretter for that. My comment was just a google guess. Adam's the authority. ;-) – kjhughes Feb 12 '19 at 17:44
  • Kjhughes did the work and so he/she should get the credit! – adamretter Feb 13 '19 at 06:18

0 Answers0