1

Note: this question would usually be a better fit in the Database Administrators Stack Exchange however I have a hunch it requires some programming to solve so I posted it here.


Is there a way to pass multiple queries to mongodump to filter several collections in the same pass? I suspect there isn't (having trawled through the documentation with no dice) but I might have missed something along the way.

With the --oplog option you can ensure that you have a point-in-time snapshot of the database.

This is true if you can dump several collections in the same command. However if you want to filter multiple collections with different queries then that is no longer true if you have to run several commands in succession, since between the first command finishing and the second command starting there may have been operations that would affect the first result.

I thought about running several commands in parallel, however I think it would strain the production system unnecessarily, especially since all of them would be dumping an oplog, because you wouldn't know beforehand which would finish first, and you would need the oplog of the command that finished last.

At this moment I am thinking of rolling my own solution by monitoring the oplog myself and dumping it in a format that's valid for mongorestore during the entire process of running the dump commands.

However before I attempt to re-invent the wheel I want to know if there's a supported way of achieving what I want, or at least some library that already does this since I don't think it's a unique use case.

Hayko Koryun
  • 1,214
  • 2
  • 12
  • 30
  • I may be mistaken, but enabling --oplog flag forces you to read from the oplog, this can be done only against server that keep oplog. To speed up things you card your db and run this against separate shards. Keep in mind that oplog is a capped collection, which means the order of docs is important there. Consider also direct table scan. Are you sure you need the db server to be enabled to perform db scan? If I am not mistaken it can take the db files as the input. If so, then you can have a slave for dumping, and run per collection, nightly dumps. – Edik Mkoyan Jun 21 '17 at 06:26

0 Answers0