1

I am performing some operation in mongoDB where in am doing bulk insert WritemanyAsync() on multiple thread. Say, there are two entities J and C (Collection). On every update of document J am fetching all C and similarly for every update of document C am fetching all J and performing a INSERT in a document say XXX.

For example (Pseudo): Say there are (J1 .. J3) and C1 .. C3 and thus if J1 updates then a operation happens with J1 X (C1 .. C3).

This entire operation happens on multiple threads. I am seeing some duplicates records in collection XXX which has same J and C (I mean with same id).

Question: is there any possibility that duplicate is occurring cause the bulk insert is happening in multiple thread. Means, by the time thread T1 performing insert for record (C1,J1) another thread is also performing the same insert and thus the duplicates.

I found that, mongoDB takes a row level (document level) lock while performing insert and so there is a chance for this situation to occur but not sure.

Can anyone please suggest or shed some light in this.

This is first time working in mongoDB and so have no idea about it (my sole experience is in relational database but mongoDB concurrency doesn't work the way relational DB does).

EDIT: From some documentation, got to know that INSERT on mongoDB acquires a write lock/latch and also there could be only one writer lock at any point in time per DB / per collection.

With that, no matter how many parallel insert happens, they will be queued and will not perform simultaneously and thus my said scenario should never occur.

Can someone please confirm (or) let me know.

Rahul
  • 76,197
  • 13
  • 71
  • 125
  • From database perspective, there are little to non chances of having duplicate records i.e. generation of duplicate _id. However if you are getting duplicate records but not _ids, I'll suggest to revisit your threading approach and most probably you'll find some kind of data racing. – Saleem Jun 30 '16 at 23:29
  • @Saleem, could be and not denying that fact all. But just wanted to be sure; so that can concentrate on that part. – Rahul Jun 30 '16 at 23:31
  • and my two cents. make sure your threads have private data and not sharing anything. Data parallelism. aka divide and conquer. – Saleem Jun 30 '16 at 23:35
  • @Saleem, yes and that has been already taken care off. I don't expect any duplicates but I am getting. may needs to revisit my logic once. – Rahul Jun 30 '16 at 23:37
  • 1
    I am experiencing duplicate `_id` error if I try to do bulk insert using multiprocessing. I am using pymongo(mongodb python driver) – Sohaib Farooqi Nov 15 '17 at 04:50

0 Answers0