Recently we completed a task to move 1000s of static PDFs that used to be stored in eXist-db to storage in Amazon S3. All these files are now retrieved through an application that points to Amazon S3 and not the database, the database is used to build out the information to be retrieved.
So we deleted all these files from the database. We deleted files that would represent about 60% (probably closer to 80%) of the size of the overall database.
Because the actual DB is replicated to multiple countries daily, we were hoping this would reduce the size of all the database files considerably.
There is no change. Is this expected? Or what steps need to be performed to actually reclaim this space?
We tried, stopping and starting ... we also tried backup (the now empty collections) and restoring just those collections thinking this would trigger things. Neither worked.
Is there (ever) any way to reclaim this? Do we actually have to backup the entire DB and restore into something clean?
Update I
OK, looking at the directory structures of two different installations that are nearly identical, except for the PDFs ...
With PDFs deleted is actually larger by 10MB than the one with PDFs deleted. In examining the /fs directory the PDFs and gone and summing the total of the collection PDFs that were removed is about 800MB.
So we removed 800MB from the database (the /fs directory is 800MB smaller). But the overall size is increased by 10MB.
The speculation above is incorrect as the overall directory is about 2.4GB so 800MB is not 60%, but smaller.
But still, I would expect that if I removed 800MB of data from the database, some reduction in size would occur and certainly not an increase of 10MB.
Update II
So since the collections deleted had no content, I created a simple XML file and dropped that into each collection. I did a backup of those two collections and a restore with the (now) almost empty collections.
That did nothing.
So it seems that (guessing) only a whole DB backup/restore will do anything.