5

I'm in the process of upgrading our database to Mongo 3.0, and I'm at the step of upgrading our daily backups process from using mongodump 2.6.1 to 3.0.1, which has has greater performance due to parallelized collection downloads.

I'm running into an issue where the mongodump fails midway through with the error

....
2015-04-10T00:42:54.606+0000    [##############..........]        XXX.XXXXXXX  6804841/11236617  (60.6%)
2015-04-10T00:42:57.352+0000    Failed: error reading collection: Closed explicitly.

Out of 8 attempts, 6 of them failed, and 2 of them went through fine. I've been unable to find anything else online about this particular error.

  • The entire mongodump is around 1TB in size, with thousands of collections. The failure happens somewhere in the middle. The mongodump does actually start up, as many .bson files start accumulating in the disk, and I can see the progress files in the output of the mongodump
    • When running the same code against a 150GB mongo 2.4 instance, it seems to go fine, it likely hasn't been running long enough to run into the error
  • The mongo database version I'm dumping from is 2.4, we're planning on upgrading 2.4 -> 2.6 -> 3.0. So we wanted to upgrade the mongodump tool in advance, hoping it would work fine against 2.4 and 2.6.
  • The current backup servers are using mongodump 2.6.1 against the 2.4 mongo databases, and they have been humming along fine, 100% reliability with the mongodump stage of the backup pipeline
  • The mongodump backup servers(google compute engine VMs) are located on a separate machine from the mongo servers(hard metal server), and the mongo servers are behind a firewall. So we establish an SSH tunnel between the two machines, then perform a mongodump with the --port command. It looks like so:

    ssh -M -N -L 1234:localhost:27017 <remote_ip>
    mongodump --port 1234 --username XXX --password XXX --out /tmp/dir
    

Can anyone give me some hints as to what might be going on? We will need to use mongodump 3.0 when our mongo databases are fully upgraded to 3.0.

UPDATE: Another error I'm getting is

2015-04-14T22:56:37.939+0000    Failed: error reading collection: read tcp XXX.X.X.X:XXXXX: use of closed network connection
mattse
  • 51
  • 7
  • how did you fix that? – Bruno Lemos Jul 17 '16 at 03:12
  • 2
    We never fully fixed it. The reliability got a bit better when we moved the Mongo Servers and the Mongodumpers onto the same data center with the same internal private network, but mongodump in 3.0 was never fixed in such a way to respond better to transient network failures, the whole pipeline just dies once it hits the first error. We reached our reliability goals by doing [snapshot backups](https://docs.mongodb.com/manual/tutorial/backup-with-filesystem-snapshots/), and forgetting mongodump altogether. – mattse Jul 17 '16 at 18:08

0 Answers0