10

Like the title says, I have a MongoDB GridFS database with a whole range of file types (e.g., text, pdf, xls), and I want to backup this database the easiest way.

Replication is not an option. Preferably I'd like to do it the usual database way of dumping the database to file and then backup that file (which could be used to restore the entire database 100% later on if needed). Can that be done with mongodump? I also want the backup to be incremental. Will that be a problem with GridFS and mongodump?

Most importantly, is that the best way of doing it? I am not that familiar with MongoDB, will mongodump work as well as mysqldump does with MySQL? Whats the best practice for MongoDB GridFS and incremental backups?

I am running Linux if that makes any difference.

tshepang
  • 12,111
  • 21
  • 91
  • 136
c00kiemonster
  • 22,241
  • 34
  • 95
  • 133

1 Answers1

18

GridFS stores files in two collections: fs.files and fs.chunks.

More information on this may be found in the GridFS Specification document: http://www.mongodb.org/display/DOCS/GridFS+Specification

Both collections may be backed up using mongodump, the same as any other collection. The documentation on mongodump may be found here: http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-mongodump

From a terminal, this would look something like the following:

For this demonstration, my db name is "gridFS":

First, mongodump is used to back the fs.files and fs.chunks collections to a folder on my desktop:

$ bin/mongodump --db gridFS --collection fs.chunks --out /Desktop
connected to: 127.0.0.1
DATABASE: gridFS     to     /Desktop/gridFS
    gridFS.fs.chunks to /Desktop/gridFS/fs.chunks.bson
         3 objects
$ bin/mongodump --db gridFS --collection fs.files --out /Desktop
connected to: 127.0.0.1
DATABASE: gridFS     to     /Desktop/gridFS
    gridFS.fs.files to /Users/mbastien/Desktop/gridfs/gridFS/fs.files.bson
         3 objects

Now, mongorestore is used to pull the backed-up collections into a new (for the purpose of demonstration) database called "gridFScopy"

$ bin/mongorestore --db gridFScopy --collection fs.chunks /Desktop/gridFS/fs.chunks.bson 
connected to: 127.0.0.1
Thu Jan 19 12:38:43 /Desktop/gridFS/fs.chunks.bson
Thu Jan 19 12:38:43      going into namespace [gridFScopy.fs.chunks]
3 objects found
$ bin/mongorestore --db gridFScopy --collection fs.files /Desktop/gridFS/fs.files.bson 
connected to: 127.0.0.1
Thu Jan 19 12:39:37 /Desktop/gridFS/fs.files.bson
Thu Jan 19 12:39:37      going into namespace [gridFScopy.fs.files]
3 objects found

Now the Mongo shell is started, so that the restore can be verified:

$ bin/mongo
MongoDB shell version: 2.0.2
connecting to: test
> use gridFScopy
switched to db gridFScopy
> show collections
fs.chunks
fs.files
system.indexes
> 

The collections fs.chunks and fs.files have been successfully restored to the new DB.

You can write a script to perform mongodump on your fs.files and fs.chunks collections periodically.

As for incremental backups, they are not really supported by MongoDB. A Google search for "mongodb incremental backup" reveals a good mongodb-user Google Groups discussion on the subject: http://groups.google.com/group/mongodb-user/browse_thread/thread/6b886794a9bf170f

For continuous back-ups, many users use a replica set. (Realizing that in your original question, you stated that this is not an option. This is included for other members of the Community who may be reading this response.) A member of a replica set can be hidden to ensure that it will never become Primary and will never be read from. More information on this may be found in the "Member Options" section of the Replica Set Configuration documentation. http://www.mongodb.org/display/DOCS/Replica+Set+Configuration#ReplicaSetConfiguration-Memberoptions

Gabe Kopley
  • 16,281
  • 5
  • 47
  • 60
Marc
  • 5,488
  • 29
  • 18
  • Although MongoDB doesn't have any incremental backup capabilities, any external incremental backup job should at the very least be able to see that there are old and new fs.files/fs.chunks and only backup new ones? I'm going to play around with it a bit to see. Replication in my mind is a bit sketchy, I'd hate to be reliant on MongoDB itself for backups. Plus ideally I'd like a daily snapshot for archiving purposes. Thanks much either way, very informative. – c00kiemonster Jan 20 '12 at 06:28
  • 2
    If the destination collection already exists, mongorestore will step through the _id of each document and only add new documents. You can supply a query to mongodump, so if your documents contain a "last updated" field, or equivalent, you can dump only the documents that were updated or added after the date of your last backup. You can also have another utility take a backup of your dbpath directory. There are some notes on this in the "Backups with Journaling Enabled" and "Shutdown and Backup" sections of the Mongo documentation on backups. http://www.mongodb.org/display/DOCS/Backups – Marc Jan 25 '12 at 23:33
  • That would come in very handy. Thanks for the tip – c00kiemonster Jan 26 '12 at 04:06