70

How can documents be moved from one collection to another collection in MongoDB?? For example: I have lot of documents in collection A and I want to move all 1 month older documents to collection B (these 1 month older documents should not be in collection A).

Using aggregation we can do copy. But what I am trying to do is moving of documents. What method can be used to move documents?

JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
manojpt
  • 1,179
  • 2
  • 12
  • 20

15 Answers15

116

The bulk operations @markus-w-mahlberg showed (and @mark-mullin refined) are efficient but unsafe as written. If the bulkInsert fails, the bulkRemove will still continue. To make sure you don't lose any records when moving, use this instead:

function insertBatch(collection, documents) {
  var bulkInsert = collection.initializeUnorderedBulkOp();
  var insertedIds = [];
  var id;
  documents.forEach(function(doc) {
    id = doc._id;
    // Insert without raising an error for duplicates
    bulkInsert.find({_id: id}).upsert().replaceOne(doc);
    insertedIds.push(id);
  });
  bulkInsert.execute();
  return insertedIds;
}

function deleteBatch(collection, documents) {
  var bulkRemove = collection.initializeUnorderedBulkOp();
  documents.forEach(function(doc) {
    bulkRemove.find({_id: doc._id}).removeOne();
  });
  bulkRemove.execute();
}

function moveDocuments(sourceCollection, targetCollection, filter, batchSize) {
  print("Moving " + sourceCollection.find(filter).count() + " documents from " + sourceCollection + " to " + targetCollection);
  var count;
  while ((count = sourceCollection.find(filter).count()) > 0) {
    print(count + " documents remaining");
    sourceDocs = sourceCollection.find(filter).limit(batchSize);
    idsOfCopiedDocs = insertBatch(targetCollection, sourceDocs);

    targetDocs = targetCollection.find({_id: {$in: idsOfCopiedDocs}});
    deleteBatch(sourceCollection, targetDocs);
  }
  print("Done!")
}
jasongarber
  • 2,136
  • 2
  • 18
  • 12
57

Update 2

Please do NOT upvote this answer any more. As written @jasongarber's answer is better in any aspect.

Update

This answer by @jasongarber is a safer approach and should be used instead of mine.


Provided I got you right and you want to move all documents older than 1 month, and you use mongoDB 2.6, there is no reason not to use bulk operations, which are the most efficient way of doing multiple operations I am aware of:

> var bulkInsert = db.target.initializeUnorderedBulkOp()
> var bulkRemove = db.source.initializeUnorderedBulkOp()
> var date = new Date()
> date.setMonth(date.getMonth() -1)
> db.source.find({"yourDateField":{$lt: date}}).forEach(
    function(doc){
      bulkInsert.insert(doc);
      bulkRemove.find({_id:doc._id}).removeOne();
    }
  )
> bulkInsert.execute()
> bulkRemove.execute()

This should be pretty fast and it has the advantage that in case something goes wrong during the bulk insert, the original data still exists.


Edit

In order to prevent too much memory to be utilized, you can execute the bulk operation on every x docs processed:

> var bulkInsert = db.target.initializeUnorderedBulkOp()
> var bulkRemove = db.source.initializeUnorderedBulkOp()
> var x = 10000
> var counter = 0
> var date = new Date()
> date.setMonth(date.getMonth() -1)
> db.source.find({"yourDateField":{$lt: date}}).forEach(
    function(doc){
      bulkInsert.insert(doc);
      bulkRemove.find({_id:doc._id}).removeOne();
      counter ++
      if( counter % x == 0){
        bulkInsert.execute()
        bulkRemove.execute()
        bulkInsert = db.target.initializeUnorderedBulkOp()
        bulkRemove = db.source.initializeUnorderedBulkOp()
      }
    }
  )
> bulkInsert.execute()
> bulkRemove.execute()
Markus W Mahlberg
  • 19,711
  • 6
  • 65
  • 89
  • Or in UI tools like Robomongo db.getCollection('source').find({}).forEach(function(doc) { db.getCollection('target').insert(doc); db.getCollection('source').remove(doc);}) – Arthur Jan 06 '16 at 19:58
  • 3
    @Arthur: Your approach has two major drawbacks. It is *much* slower *and* you might have incomplete collections hard to synchronize again in a worst case scenario. – Markus W Mahlberg Jan 07 '16 at 00:10
  • This didn't work for me. I tried this on a collection with 50M records, and tried to move about 25M out. It failed on the find query with the error `Fatal error in CALL_AND_RETRY_2 # Allocation failed - process out of memory`. This is on a server with 32GB of memory, and the records only had 5 fields. The total data size of the collection is only about 5GB. – UpTheCreek Mar 11 '16 at 15:56
  • Do you mean put a limit on the find query? I didn't even execute anything, The out of mem error happened during the foreach. If the bulk features can only deal with 10k chunks then they are not really fit for purpose :/ (Sorry, not venting at you, and I appreciate your help, just feeling a bit frustrated!) – UpTheCreek Mar 11 '16 at 16:23
  • @UpTheCreek See edit. And I just want to rule out that we have a memory problem. Because it sounds like that. Possibly imposed by the OS. – Markus W Mahlberg Mar 11 '16 at 16:31
  • @MarkusWMahlberg - thanks - I'll try that as soon as I can and report back. (BTW, I'm on debian 8, with `ulimit -m unlimited`) – UpTheCreek Mar 11 '16 at 19:20
  • ICYMI, I published this code as NPM package - https://github.com/vladgolubev/mongo-move – Vlad Holubiev Jun 17 '18 at 16:13
22

Insert and remove:

var documentsToMove = db.collectionA.find({});
documentsToMove.forEach(function(doc) {
    db.collectionB.insert(doc);
    db.collectionA.remove(doc);
});

note: this method might be quite slow for large collections or collections holding large documents.

doron
  • 454
  • 6
  • 10
13

$out is use to create the new collection with data , so use $out

db.oldCollection.aggregate([{$out : "newCollection"}])

then use drop

db.oldCollection.drop()
karthi
  • 880
  • 6
  • 10
  • 3
    Bare in mind this will OVERWRITE the whole collection if it already exists with that name (instead of appending the matched documents from the old one)! – Josip Filipović Nov 12 '21 at 14:04
5

you can use range query to get data from sourceCollection and keep the cursor data in variable and loop on it and insert to target collection:

 var doc = db.sourceCollection.find({
        "Timestamp":{
              $gte:ISODate("2014-09-01T00:00:00Z"),
              $lt:ISODate("2014-10-01T00:00:00Z")
        }
 });

 doc.forEach(function(doc){
    db.targetCollection.insert(doc);
 })

Hope so it helps!!

UpTheCreek
  • 31,444
  • 34
  • 152
  • 221
Ninad
  • 474
  • 8
  • 26
  • Yeahhhh!! Is insert() and remove() the optimized solution to do ?? – manojpt Nov 20 '14 at 12:31
  • this might help you http://stackoverflow.com/questions/5942575/is-moving-documents-between-collections-a-good-way-to-represent-state-changes-in – Ninad Nov 20 '14 at 12:35
5

Here's an update to @jasongarber's answer which uses the more recent mongo 'bulkWrite' operation (Read docs here), and also keeps the whole process asynchronous so you can run it as part of a wider script which depends on its' completion.

async function moveDocuments (sourceCollection, targetCollection, filter) {
  const sourceDocs = await sourceCollection.find(filter)

  console.log(`Moving ${await sourceDocs.count()} documents from ${sourceCollection.collectionName} to ${targetCollection.collectionName}`)

  const idsOfCopiedDocs = await insertDocuments(targetCollection, sourceDocs)

  const targetDocs = await targetCollection.find({_id: {$in: idsOfCopiedDocs}})
  await deleteDocuments(sourceCollection, targetDocs)

  console.log('Done!')
}

async function insertDocuments (collection, documents) {
  const insertedIds = []
  const bulkWrites = []

  await documents.forEach(doc => {
    const {_id} = doc

    insertedIds.push(_id)
    bulkWrites.push({
      replaceOne: {
        filter: {_id},
        replacement: doc,
        upsert: true,
      },
    })
  })

  if (bulkWrites.length) await collection.bulkWrite(bulkWrites, {ordered: false})

  return insertedIds
}

async function deleteDocuments (collection, documents) {
  const bulkWrites = []

  await documents.forEach(({_id}) => {
    bulkWrites.push({
      deleteOne: {
        filter: {_id},
      },
    })
  })

  if (bulkWrites.length) await collection.bulkWrite(bulkWrites, {ordered: false})
}
Matt Wills
  • 676
  • 6
  • 11
  • Why are the `targetDocs` read from the `targetCollection` again? Does the bulk operation not return an error if anything goes wrong? – walderich Mar 11 '22 at 09:30
  • Just one typo in your script: L. 8 should become: const targetDocs = await sourceCollection.find({_id: {$in: idsOfCopiedDocs}}). You need docs from source collection to be deleted :) – Torsten Barthel Mar 30 '23 at 07:01
4

First option (Using mongo dump)

1.Get a dump from collection

mongodump -d db -c source_collection

2.Restore from collection

mongorestore -d db -c target_collection dir=dump/db_name/source_collection.bson

Second Option

Running aggregate

db.getCollection('source_collection').aggregate([ { $match: {"emailAddress" : "apitester@mailinator.com"} }, { $out: "target_collection" } ])

Third Option (Slowest)

Running a through for loop

db.getCollection('source_collection').find().forEach(function(docs){ db.getCollection('target_collection').insert(docs); }) print("Rolleback Completed!");

3

May be from the performance point of view it's better to remove a lot of documents using one command(especially if you have indexes for query part) rather than deleting them one-by-one.

For example:

db.source.find({$gte: start, $lt: end}).forEach(function(doc){
   db.target.insert(doc);
});
db.source.remove({$gte: start, $lt: end});
alekseevi15
  • 1,732
  • 2
  • 16
  • 20
3

This is a restatement of @Markus W Mahlberg

Returning the favor - as a function

function moveDocuments(sourceCollection,targetCollection,filter) {
    var bulkInsert = targetCollection.initializeUnorderedBulkOp();
    var bulkRemove = sourceCollection.initializeUnorderedBulkOp();
    sourceCollection.find(filter)
        .forEach(function(doc) {
        bulkInsert.insert(doc);
        bulkRemove.find({_id:doc._id}).removeOne();
        }
  )
  bulkInsert.execute();
  bulkRemove.execute();
}

An example use

var x = {dsid:{$exists: true}};
moveDocuments(db.pictures,db.artifacts,x)

to move all documents that have top level element dsid from the pictures to the artifacts collection

Mark Mullin
  • 1,340
  • 1
  • 9
  • 21
0

From MongoDB 3.0 up, you can use the copyTo command with the following syntax:

db.source_collection.copyTo("target_collection")

Then you can use the drop command to remove the old collection:

db.source_collection.drop()
Diogo Rosa
  • 311
  • 2
  • 7
  • 1
    `copyTo` is deprecated since version 3.0: https://docs.mongodb.com/manual/reference/method/db.collection.copyTo/ Moreover, it does not support filters. – Rea Haas Apr 11 '21 at 13:19
0

I do like the response from @markus-w-mahlberg, however at times, I have seen the need to keep it a bit simpler for people. As such I have a couple of functions that are below. You could naturally wrap thing here with bulk operators as he did, but this code works with new and old Mongo systems equally.

function parseNS(ns){
    //Expects we are forcing people to not violate the rules and not doing "foodb.foocollection.month.day.year" if they do they need to use an array.
    if (ns instanceof Array){
        database =  ns[0];
        collection = ns[1];
    }
    else{
        tNS =  ns.split(".");
        if (tNS.length > 2){
            print('ERROR: NS had more than 1 period in it, please pass as an [ "dbname","coll.name.with.dots"] !');
            return false;
        }
        database = tNS[0];
        collection = tNS[1];
    }
    return {database: database,collection: collection};
}

function insertFromCollection( sourceNS,  destNS, query, batchSize, pauseMS){
    //Parse and check namespaces
    srcNS = parseNS(sourceNS);
    destNS = parseNS(destNS);
    if ( srcNS == false ||  destNS == false){return false;}

    batchBucket = new Array();
    totalToProcess = db.getDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
    currentCount = 0;
    print("Processed "+currentCount+"/"+totalToProcess+"...");
    db.getDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
        batchBucket.push(doc);
        if ( batchBucket.length > batchSize){
            db.getDB(destNS.database).getCollection(destNS.collection)insert(batchBucket);
            currentCount += batchBucket.length;
            batchBucket = [];
            sleep (pauseMS);
            print("Processed "+currentCount+"/"+totalToProcess+"...");       
        }
    }
    print("Completed");
}

/** Example Usage:
        insertFromCollection("foo.bar","foo2.bar",{"type":"archive"},1000,20);    

You could obviously add a db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).remove(query,true) If you wanted to also remove the records after they are copied to the new location. The code can easily be built like that to make it restartable.

0

I had 2297 collection for 15 million of documents but some collection was empty.

Using only copyTo the script failed, but with this script optimization:

db.getCollectionNames().forEach(function(collname) {
    var c = db.getCollection(collname).count();
    if(c!==0){
      db.getCollection(collname).copyTo('master-collection');
      print('Copied collection ' + collname);
    }
});

all works fine for me.

NB: copyTo is deprecated because it block the read/write operation: so I think is fine if you know that the database is not usable during this operation.

Manuel Spigolon
  • 11,003
  • 5
  • 50
  • 73
0

In my case for each didn't work. So I had to make some changes.

var kittySchema = new mongoose.Schema({
name: String
});

var Kitten = mongoose.model('Kitten', kittySchema);

var catSchema = new mongoose.Schema({
name: String
});

var Cat = mongoose.model('Cat', catSchema);

This is Model for both the collection

`function Recursion(){
Kitten.findOne().lean().exec(function(error, results){
    if(!error){
        var objectResponse = results;
        var RequiredId = objectResponse._id;
        delete objectResponse._id;
        var swap = new Cat(objectResponse);
        swap.save(function (err) {
           if (err) {
               return err;
           }
           else {
               console.log("SUCCESSFULL");
               Kitten.deleteOne({ _id: RequiredId }, function(err) {
                if (!err) {
                        console.log('notification!');
                }
                else {
                        return err;
                }
            });
               Recursion();
           }
        });
    }
    if (err) {
        console.log("No object found");
        // return err;
    }
})
}`
0

Based on Mat Wills answer here comes the Typescript version thats fully working. It's based on NestJS. Following is the implementation from a service class:

...
async moveDocuments(
        sourceCollection: Model<any>,
        targetCollection: Model<any>,
        filter: any
    ) {
        const sourceDocs = await sourceCollection.find(filter)

        console.log(sourceDocs.length)

        console.log(
            `Moving ${await sourceDocs.length} documents from ${
                sourceCollection.modelName
            } to ${targetCollection.modelName}`
        )

        const idsOfCopiedDocs = await this.insertDocuments(
            targetCollection,
            sourceDocs
        ).catch((err) => {
            return Promise.reject(err)
        })

        const targetDocs = await sourceCollection.find({
            _id: { $in: idsOfCopiedDocs },
        })

        await this.deleteDocuments(sourceCollection, targetDocs).catch(
            (err) => {
                return Promise.reject(err)
            }
        )

        return Promise.resolve('Cleanups done!')
    }

    private async insertDocuments(
        collection: Model<any>,
        documents: Document[]
    ) {
        const insertedIds = []
        const bulkWrites = []

        await documents.forEach((doc) => {
            const { _id } = doc

            insertedIds.push(_id)
            bulkWrites.push({
                replaceOne: {
                    filter: { _id },
                    replacement: doc,
                    upsert: true,
                },
            })
        })

        if (bulkWrites.length)
            await collection.bulkWrite(bulkWrites, { ordered: false })

        return insertedIds
    }

    private async deleteDocuments(
        collection: Model<any>,
        documents: Document[]
    ) {
        const bulkWrites = []

        await documents.forEach(({ _id }) => {
            bulkWrites.push({
                deleteOne: {
                    filter: { _id },
                },
            })
        })

        if (bulkWrites.length)
            await collection.bulkWrite(bulkWrites, { ordered: false })
    }

And thats how you might call the service with the help of dayjs:

...
const now = dayjs()
const xMonthBeforeNow = now.subtract(5, 'months')
const filter = {
    createdAt: {
        $lte: date,
    }
},

const cleanupResult = await this.cleaningService.moveDocuments(
            this.anyModel,
            this.anyModelOld,
            filter
        )
...
Torsten Barthel
  • 3,059
  • 1
  • 26
  • 22
-1

I planned to arhieve 1000 records at a time using bulkinsert and bulkdelete methods of pymongo.

For both source and target

  1. create mongodb objects to connect to the database.

  2. instantiate the bulk objects. Note: I created a backup of bulk objects too. This will help me to rollback the insertion or removal when an error occurs. example:

    For source // replace this with mongodb object creation logic source_db_obj = db_help.create_db_obj(source_db, source_col) source_bulk = source_db_obj.initialize_ordered_bulk_op() source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
    For target // replace this with mogodb object creation logic target_db_obj = db_help.create_db_obj(target_db, target_col) target_bulk = target_db_obj.initialize_ordered_bulk_op() target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()

  3. Obtain the source records that matches the filter criteria

    source_find_results = source_db_obj.find(filter)

  4. Loop through the source records

    create target and source bulk operations

    Append archived_at field with the current datetime to the target collection

    //replace this with the logic to obtain the UTCtime. doc['archived_at'] = db_help.getUTCTime() target_bulk.insert(document) source_bulk.remove(document)

    for rollback in case of any errors or exceptions, create target_bulk_bak and source_bulk_bak operations.

    target_bulk_bak.find({'_id':doc['_id']}).remove_one() source_bulk_bak.insert(doc) //remove the extra column doc.pop('archieved_at', None)

  5. When the record count to 1000, execute the target - bulk insertion and source - bulk removal. Note: this method takes target_bulk and source_bulk objects for execution.

    execute_bulk_insert_remove(source_bulk, target_bulk)

  6. When exception occurs, execute the target_bulk_bak removal and source_bulk_bak inesertions. This would rollback the changes. Since mongodb doesn't have rollback, I came up with this hack

    execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)

  7. Finally re-initialize the source and target bulk and bulk_bak objects. This is necessary because you can use them only once.

  8. Complete code

        def execute_bulk_insert_remove(source_bulk, target_bulk):
            try:
                target_bulk.execute()
                source_bulk.execute()
            except BulkWriteError as bwe:
                raise Exception(
                    "could not archive document, reason:    {}".format(bwe.details))
    
        def archive_bulk_immediate(filter, source_db, source_col, target_db, target_col):
            """
            filter: filter criteria for backup
            source_db: source database name
            source_col: source collection name
            target_db: target database name
            target_col: target collection name
            """
            count = 0
            bulk_count = 1000
    
            source_db_obj = db_help.create_db_obj(source_db, source_col)
            source_bulk = source_db_obj.initialize_ordered_bulk_op()
            source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
    
            target_db_obj = db_help.create_db_obj(target_db, target_col)
            target_bulk = target_db_obj.initialize_ordered_bulk_op()
            target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
    
            source_find_results = source_db_obj.find(filter)
    
            start = datetime.now()
    
            for doc in source_find_results:
                doc['archived_at'] = db_help.getUTCTime()
    
                target_bulk.insert(doc)
                source_bulk.find({'_id': doc['_id']}).remove_one()
                target_bulk_bak.find({'_id': doc['_id']}).remove_one()
                doc.pop('archieved_at', None)
                source_bulk_bak.insert(doc)
    
                count += 1
    
                if count % 1000 == 0:
                    logger.info("count: {}".format(count))
                    try:
                        execute_bulk_insert_remove(source_bulk, target_bulk)
                    except BulkWriteError as bwe:
                        execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)
                        logger.info("Bulk Write Error: {}".format(bwe.details))
                        raise
    
                    source_bulk = source_db_obj.initialize_ordered_bulk_op()
                    source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
    
                    target_bulk = target_db_obj.initialize_ordered_bulk_op()
                    target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
    
            end = datetime.now()
    
            logger.info("archived {} documents to {} in ms.".format(
                count, target_col, (end - start)))
    
Preethi Lakku
  • 89
  • 1
  • 4
  • 2
    Hi and welcome to Stack Overflow! Take a minute to read through [answer] - this looks helpful but it would benefit from some explanation of what the code does, consider [edit]-ing that in? – user812786 Dec 29 '16 at 19:32