0

Link to similar problem
(assuming above link provides context)

For a very limited duration, Mongodb in my case, receives a lot of connections. During this time-span if any insert happens, multiple duplicate documents get created.

Please note that, from code, I'm saving only one document. However, mongodb stores duplicates with different _ids. Many solutions suggest adding unique index on collection, but it isn't possible in my case.

What could be the best way of making sure that for one insert operation, there is only one entry in the database?
Also, what is the reason it creates duplicate documents? I use mongoengine. Is it mongodb, mongoengine or webserver (apache) that is making retry insert calls and why?

Versions used:
mongoengine 0.8.7
pymongo 2.8.1
mongodb 2.6.12
Python 2.7.12

  • Dear Aparna, welcome on StackOverflow. Please read https://stackoverflow.com/help/on-topic. The question should include all the context within the question itself, and since you have discovered the similar question, you may want to explain why the answer doesn't work for you. – Alex Blex Feb 28 '18 at 13:07
  • I'm not sure if duplicate records are getting created because of excessive connections or some other cause. Also, no answer in the similar question explains why the duplication occurs. – Aparna Kulkarni Feb 28 '18 at 13:29
  • mongodb usually does not duplicate documents by its own. Most likely the reason of duplication is your app sends multiple identical inserts. You need to debug it. Try to enable profiler to check details of the queries your app sends. – Alex Blex Feb 28 '18 at 13:52
  • I will debug it. Thanks for response @Alex. Another question, to insert, I use `save()` method of `mongoengine`, it updates record if `_id` is given, inserts otherwise. I create an object of collection type, fill its attributes and `save()` it (note that it doesn't have `_id` field). `save()` then generates `_id` and inserts document. Is there a possibility that `save()` itself is making many calls with different `_id`s? Should I use `update(upsert=True, **kwargs)` instead? – Aparna Kulkarni Feb 28 '18 at 14:19
  • It must be the driver who generates the _id, not mongoengine. upserts will help to hide the problem and prevent duplicates, but if I were you, I'd try to solve the cause of the problem instead. It's very unlikely that `save()` results with multiple inserts. If it does and you can reliable reproduce it in isolation, please file a bug report at https://github.com/MongoEngine/mongoengine/issues – Alex Blex Feb 28 '18 at 14:30

1 Answers1

1

Upsert is another way to avoid duplicate entries, bulk upsert document bulk upsert pymongo

updateBulk = db.collection.initialize_unordered_bulk_op()

updateBulk.find({
                    "field1":"field1",
                    "field2":"field2",
                      ...
                      ...
                      ...
                   "fieldn":"fieldn"

                }).upsert().update_one({'$set': {
                   "field1":"field1",
                    "field2":"field2",
                      ...
                      ...
                      ...
                   "fieldn":"fieldn",
                    }})

result1 = updateBulk.execute();
Murugan Perumal
  • 975
  • 5
  • 15