0

I have a requirement where I have to copy documents inserted in last six hour to another collection and I have to do this in a periodic manner. Also, if a documents exists in the target collection I have to update the document from the source collection.

Some stats about the source collection -

  • Source collection has 'JobId' & 'ModifiedDate' as its index field.
  • Source collection can receive upto 6 million insert/update events per day.

I have referred these links to come up with this code - move docs from one coll to another & Bulk.find.upsert

var copyDocsToJobModel = db.JobsModel.initializeUnorderedBulkOp()

var x = 5000
var counter = 0
var lastCopyTime = new Date(Date.now() - 6 * 60 * 60)

var prev_count = db.JobsModel.count()

db.Messages.find({"ModifiedDate":{$gte: lastCopyTime}}).forEach(
  function(doc){
    delete doc._id
    copyDocsToJobModel.find({'JobId': doc.JobId}).upsert().updateOne(doc);
    counter ++
    if( counter % x == 0){
      copyDocsToJobModel.execute()
      copyDocsToJobModel = db.JobsModel.initializeUnorderedBulkOp()
    }
  }
 )

var resp = copyDocsToJobModel.execute()

var curr_count = db.JobsModel.count()
[prev_count, curr_count]

The code is working as expected but I have the following queries -

  • Due the the high amount of traffic, we don't want to block the writes/updates while copying documents to the target collection.
  • We want to keep the system load as low as possible while copying records
  • Any further optimization that can be done here to make the script run faster or consume less resources

Thanks in advance.

Anurag Sharma
  • 4,839
  • 13
  • 59
  • 101

0 Answers0