So while it it "possible" to use the shell ( and no-one said it wasn't ) it's just not the "best" way.
Dump and Restore with query
The "best" approach is using mongodump
and mongorestore
. You don't need "temporary dump files" either. It's just a matter of "piping" output from one into the other:
Depending on which host you actually run this from as to where you put the -h
option:
mongodump -h host2 -d db2 -c item \
--query '{ "date": { "$gte": "2016-03-15" } }' \
--out - \
| mongorestore -d db1 -c item -
From MongoDB 3.2 releases these commands can use compressed data as well. This needs the --gzip
and --archive
options:
mongodump -h host2 -d db2 -c item \
--query '{ "date": { "$gte": "2016-03-15" } }' \
--gzip --archive \
| mongorestore -d db1 -c item --gzip --archive
That's always the fastest way to move things between databases and especially between hosts.
Using the shell
If you are insistent on writing this in the shell, then you should at least get it right.
Of course you can use the connect()
or Mongo()
methods to refernce the remote connection, but that is really only part of the story, since once connected you still need to handle this efficiently.
The best way to do this is use "Bulk Operations", as this removes the overhead of request and acknowledgement for every new .insert()
operation with the target server and collection. It's going to reduce a lot of time, though still not as efficient as the use of utilities above:
Modern MongoDB 3.2 has bulkWrite()
:
var db2 = connect('host2/db2');
var operations = [];
db2.item.find({ "date": { "$gte": "2016-03-15" } }).forEach(function(doc) {
operations.push({ "insertOne": { "document": doc } });
// Actually only write every 1000 entries at once
if ( operations.length == 1000 ) {
db.item.bulkWrite(operations,{ "ordered": false })
operations = [];
}
});
// Write any remaining
if ( operations.length > 0 ) {
db.item.bulkWrite(operations,{ "ordered": false });
}
For MongoDB 2.6 releases there is another "bulk" constructor:
var db2 = connect('host2/db2');
var bulk = db.item.initializeUnorderedBulkOp();
var count = 0;
db2.item.find({ "date": { "$gte": "2016-03-15" } }).forEach(function(doc) {
bulk.insert(doc);
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.item.initializeUnorderedBulkOp();
}
});
if ( count % 1000 != 0 ) {
bulk.execute();
}
Of course the newer method is really just calling the same underlying "older" methods underneath. But the main point is for consistency in other API's, where quite often the point is to "downgrade" the operations when working with a server version less than MongoDB 2.6 that has no "Bulk Operations" wire protocol, and then just handles the loop and commit of each operation in the batch for you.
In either case the "unordered" approach is best, since the operations are in fact committed on the server in "parallel" instead of "serially", which means multiple things are actually writing at the same time.
Conclusion
So really, all of this is how the code is implemented in external utilities anyway, and actually in a more organized and "low level" form. Naturally the "shell" does not compress data "over the wire" with comunication between hosts, nor does it have access to the "low level" write functions you could do with a BSON library and low level code, that both work much faster.
The "dump and restore" actually can work directly with a compressed BSON form of the data and commits the writes in a very efficient way. By that token, it is your best option for doing this rather than coding the implementation yourself.