MongoDB 2.2: why didn't replication catch up a collection following a dump/restore?

Question

We have a three-server replicaset running MongoDB 2.2 on Ubuntu 10.04, and recently had to upgrade the hard drive for each server where one particular database resides. This database contains log information for web service requests, where they write to collections in hourly buckets using the current timestamp to determine the name, e.g. log_yyyymmddhh.

I performed this process:

backup the database on the primary server with mongodump --db log_db
take a secondary server offline, replace the disk
bring the secondary server up in standalone mode (i.e. comment out the replSet entry in /etc/mongodb.conf before starting the service)
restore the database on the secondary server with mongorestore --drop --db log_db
add the secondary server back into the replicaset and bring it online, letting replication catch up the hourly buckets that were updated/created while it had been offline

Everything seemed to go as expected, except that the collection which was the current bucket at the time of the backup was not brought up to date by replication. I had to manually copy that collection over by hand to get it up to date. Note that collections which were created after the backup were synched just fine.

What did I miss in this process that caused MongoDB not to get things back in synch for that one collection? I assume something got out of whack with regard to the oplog?

Edit 1:

The oplog on the primary showed that its earliest timestamp went back a couple of days, so there should have been plenty of space to maintain transactions for a few hours (which was the time the secondary was offline).

Edit 2:

Our MongoDB installation uses two disk partitions: /dev/sda1 and /dev/sdb1. The primary MongoDB directory /var/lib/mongodb/ is on /dev/sda1, and holds several databases, while the log database resides by itself on /dev/sdb1. There's a sym link /var/lib/mongodb/log_db which points to a directory on /dev/sdb1. Since the log db was getting full, we needed to upgrade the disk for /dev/sdb1.

Was it all new operations on the current bucket or all operations older than a certain time? — Sammaye, Jul 02 '14 at 22:08
Actually, I didn't check that closely. I just saw that the number of documents was less than that on the primary server, and assumed that newer operations hadn't been replicated. Maybe it was the other way around! Maybe that collection wasn't included in the initial dump? — Alan, Jul 02 '14 at 22:11
I am thinking that your oplog might be too small for the period of time that it took for you to copy and up your secondary to the point it was ready to sync to hold older operations concerning the current collection bucket when that dump was done and only stores newer operations for buckets made after the current bucket — Sammaye, Jul 02 '14 at 22:17
Yeah, I thought of that and had checked the primary server's oplog -- see Edit I just added to the question. — Alan, Jul 02 '14 at 22:21
Can you find OPs meant for that bucket collection within the oplog? I guess that would be the first step to debugging this — Sammaye, Jul 02 '14 at 23:23
Unfortunately, I waited too long to post this question. I did this a few days ago, so the oplog no longer contains those entries. I guess my question is about the overall process, does the process I used make sense or should I have done something different? — Alan, Jul 02 '14 at 23:27
Frankly, logically this should have worked, even the answer has problems with this in consideration. The only thing I can think of is that if no oplog is present on the secondary MongoDB won't pick new ops for existing collections from the primary, but that doesn't make much sense and doesn't seem logical to me. It should have just read from the beginning of the primaries oplog and applied all operations or applied no ops and said that it cannot determine the state of your upped secondary — Sammaye, Jul 03 '14 at 07:07

score 0 · Answer 1 · answered Jul 02 '14 at 22:33

0

I would guess this has to do with the oplog not being long enough, although it seems like you checked that and it looked reasonably big.

Still, when adding new members to a replica set you shouldn't be snapshotting and restoring them. It's better to simply add a new member and let replication happen by itself. This is described in the Mongo docs and is the process I've always followed.

answered Jul 02 '14 at 22:33

brianz

7,268
4
37
44

Technically this wasn't a new member. I had only taken the secondary server out of the replicaset while I did the mongorestore. And I had done the dump/restore because I knew the oplog didn't go back far enough to contain all of the hourly buckets I wanted to preserve. – Alan Jul 02 '14 at 22:39
If you replaced the disk, it's basically a new server since it didn't have any data. Semantics aside, it's good practice to treat nodes like new members if there is anything abnormal going on IMO. – brianz Jul 02 '14 at 22:50
Right, however this exercise was about upgrading the disk for only one database, which resides on a separate partition (/dev/sdb1) from the rest of the server's databases. I'll add more details to the question! – Alan Jul 02 '14 at 22:57
Btw, thanks for your feedback on this. I did check the Mongo docs you referenced, and the "Production Notes" section there do discuss using a backup to get a member online quickly. I'm still not sure what I missed in this process. – Alan Jul 02 '14 at 23:50

score 0 · Accepted Answer · answered Jul 03 '14 at 00:24

You should be using mongodump with the --oplog option. Running a full database backup with mongodump on a replicaset that is updating collections at the same time may not leave you with a consistent backup. This becomes worse with larger databases, more collections and more frequent updates/inserts/deletes.

From the documentation for your version (2.2) of MongoDB (it's the same for 2.6 but just to be as accurate as possible):

--oplog

Use this option to ensure that mongodump creates a dump of the database that includes an oplog, to create a point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.

Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.

http://docs.mongodb.org/v2.2/reference/mongodump/

This is not covered well in most MongoDB tutorials around backups and restores. Generally you are better off if you can perform a live snapshot of the storage volume your database resides on (assuming your storage solution has a live snapshot ability compatible with MongoDB). Failing that, your next best bet is taking a secondary offline and then performing a snapshot or backup of the database files. Mongodump on a live database is increasingly a less optimal solution for larger databases due to performance issues.

I'd definitely take a look at the MongoDB overview of backup options: http://docs.mongodb.org/manual/core/backups/

You know, I had quickly read about the --oplog option, and totally not grokked its usage. Sounds like that's what I missed. Even better would have been to do as you recommended and taken the dump from the secondary while it was offline, before I replaced the disk. Thanks! — Alan, Jul 03 '14 at 00:39
Even though this answer does state best practice it doesn't go into how or why some ops were applied and others were not, but anyway it seems OP wanted to really know best practice for next time so... — Sammaye, Jul 03 '14 at 07:15

MongoDB 2.2: why didn't replication catch up a collection following a dump/restore?

2 Answers2