I have a requirement where I have to copy documents inserted in last six hour to another collection and I have to do this in a periodic manner. Also, if a documents exists in the target collection I have to update the document from the source collection.
Some stats about the source collection -
- Source collection has 'JobId' & 'ModifiedDate' as its index field.
- Source collection can receive upto 6 million insert/update events per day.
I have referred these links to come up with this code - move docs from one coll to another & Bulk.find.upsert
var copyDocsToJobModel = db.JobsModel.initializeUnorderedBulkOp()
var x = 5000
var counter = 0
var lastCopyTime = new Date(Date.now() - 6 * 60 * 60)
var prev_count = db.JobsModel.count()
db.Messages.find({"ModifiedDate":{$gte: lastCopyTime}}).forEach(
function(doc){
delete doc._id
copyDocsToJobModel.find({'JobId': doc.JobId}).upsert().updateOne(doc);
counter ++
if( counter % x == 0){
copyDocsToJobModel.execute()
copyDocsToJobModel = db.JobsModel.initializeUnorderedBulkOp()
}
}
)
var resp = copyDocsToJobModel.execute()
var curr_count = db.JobsModel.count()
[prev_count, curr_count]
The code is working as expected but I have the following queries -
- Due the the high amount of traffic, we don't want to block the writes/updates while copying documents to the target collection.
- We want to keep the system load as low as possible while copying records
- Any further optimization that can be done here to make the script run faster or consume less resources
Thanks in advance.