0

I have a GridFS MongoDB database that I need to manage the size of. It has been running very well since it was created, but I have never really looked at its disk size until now.

Judging by this outout from the db.stats() command

> db.stats()
{
    "db" : "documents",
    "collections" : 4,
    "objects" : 10967,
    "avgObjSize" : 52491.573994711405,
    "dataSize" : 575675092,
    "storageSize" : 595255296,
    "numExtents" : 24,
    "indexes" : 4,
    "indexSize" : 686784,
    "fileSize" : 2080374784,
    "nsSizeMB" : 16,
    "ok" : 1
}

it seems the database itself is roughly 600MB. This size makes sense to me as it is the same size as the database backups I get from mongodump. The file size is far larger though, and it gets worse when I look in the data directory itself in /var/lib/mongodb:

root@deathstar:/var/lib/mongodb# ls -la
total 2474036
drwxr-xr-x  5 mongodb mongodb       4096 Apr 15 09:28 .
drwxr-xr-x 62 root    root          4096 Mar  4 07:48 ..
drwxr-xr-x  2 mongodb mongodb       4096 Apr 13 11:48 documents
-rw-------  1 mongodb mongodb   67108864 Apr 15 09:16 documents.0
-rw-------  1 mongodb mongodb  134217728 Apr 13 11:48 documents.1
-rw-------  1 mongodb mongodb  268435456 Apr 13 11:48 documents.2
-rw-------  1 mongodb mongodb  536870912 Apr 15 09:16 documents.3
-rw-------  1 mongodb mongodb 1073741824 Apr 13 11:50 documents.4
-rw-------  1 mongodb mongodb   16777216 Apr 15 09:16 documents.ns
drwxr-xr-x  2 mongodb mongodb       4096 Apr 13 11:50 journal
-rwxr-xr-x  1 mongodb mongodb          5 Apr 13 11:46 mongod.lock
drwxr-xr-x  2 mongodb mongodb       4096 Apr 15 09:28 _tmp
-rw-------  1 mongodb mongodb   67108864 Apr 15 09:28 -v.0
-rw-------  1 mongodb mongodb   67108864 Apr 15 09:28 v.0
-rw-------  1 mongodb mongodb  134217728 Apr 15 09:28 -v.1
-rw-------  1 mongodb mongodb  134217728 Apr 15 09:28 v.1
-rw-------  1 mongodb mongodb   16777216 Apr 15 09:28 -v.ns
-rw-------  1 mongodb mongodb   16777216 Apr 15 09:28 v.ns

And this in /var/lib/mongodb/journal:

root@deathstar:/var/lib/mongodb/journal# ls -la
total 3145752
drwxr-xr-x 2 mongodb mongodb       4096 Apr 13 11:50 .
drwxr-xr-x 5 mongodb mongodb       4096 Apr 15 09:28 ..
-rw------- 1 mongodb mongodb 1073741824 Apr 15 09:28 j._2
-rw------- 1 mongodb mongodb         88 Apr 15 09:28 lsn
-rw------- 1 mongodb mongodb 1073741824 May  5  2012 prealloc.1
-rw------- 1 mongodb mongodb 1073741824 May  5  2012 prealloc.2

Now correct me if I'm wrong, but I am basically looking at 5.5GB disk size for a 600MB database. That is pretty inefficient.

How can I reduce the disk size? Is there a similar command to OPTIMIZE TABLE in MySQL?

I don't know whether GridFS is a different beast from a regular database, but I tried running compact but it didn't do anything to the disk size.

And how about the journal files? Can I somehow reduce the disk size of all journal files?

c00kiemonster
  • 22,241
  • 34
  • 95
  • 133
  • 1
    this has nothing to do with GridFS. Journal is there to provide durability, and MongoDB always preallocates files before it needs them. When you mongodump you don't get the preallocated files nor the journal. If you want to have smaller DB look at --smallfiles and --noprealloc options to mongod. I don't recommend running witout journalling ever. – Asya Kamsky Apr 15 '13 at 03:07
  • Asya: both `--smallfiles` and `--noprealloc` is for for the journalling right? I'm going to try the former and not the latter as I still want journalling. How about the database files themselves? If I less away the journalling I am still looking at >2GB for a 600MB database... – c00kiemonster Apr 15 '13 at 03:19
  • 2
    neither - they are for data files. it's only whether the files are allocated when needed or in advance and whether they are constantly larger. does not turn off journaling the ONLY way to do that is --nojournal. – Asya Kamsky Apr 15 '13 at 03:21
  • @Asya I decided just to use `--smallfiles` and it works just fine. Thanks for the help. – c00kiemonster Apr 23 '13 at 04:14
  • happy to help. do you think it would be worthwhile to summarize this in the answer? – Asya Kamsky Apr 23 '13 at 04:20
  • You should, and I would select it as an answer accordingly. – c00kiemonster Apr 23 '13 at 05:27

1 Answers1

1

The issue with large files is not specific to GridFS.

Journal is there to provide durability, and MongoDB always preallocates files before it needs them. I would recommend not changing anything here - i.e. continue using journaling to protect your files in case of an unexpected crash of the server.

You see much smaller files with mongodump because you don't get the preallocated data files nor journal files.

If you want to have smaller DB directory, I recommend looking at --smallfiles and --noprealloc options to mongod. Both affect one when space is allocated and how much is allocated at a time.

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
  • I found quite a few people asking about the stability etc using `--smallfiles`. In my relatively small example everything is working completely fine, with a smaller disk usage exactly as advertised. I can't really notice any difference at all. – c00kiemonster Apr 23 '13 at 05:37
  • It's not related in any way to stability - the only difference will be performance because once you fill a file you will need to wait for next file to be allocated (with --noprealloc) and with --smallfiles you will be waiting for files more frequently. – Asya Kamsky Apr 23 '13 at 13:33