I have an existing collection in MongoDB and I need to update certain documents by getting values from a MySQL table.
Currently, I am using the following python code to update the Collection using MongoEngine but it takes a lot of time since the dataset is very large (2m+ documents in the Collection, out of which 80k+ documents need to be updated)
query = "Select * from business"
cursor.execute(query)
for line in cursor:
Movie.objects(movieid=line[0]).first().update(set__business=line[1])
While updating early documents (like first 5k documents) it takes very less amount of time, but when it starts to go beyond that, it takes forever. My assumption is that it starts finding the record from the beginning every time. It can be somehow optimized, since MySQL table records are ordered.
Is there a way to search in the collection so that the next loop starts from the last updated documentid? What is the best method to update MongoDB collections using MongoEngine?
(Additional info: inserting 2m+ records took about 2 hours whereas updating 80k records took 25 hours)