0

i'm writing a script that puts a large number of xml files into mongodb, thus when i execute the script multiple times the same object is added many times to the same collection.

I checked out for a way to stop this behavior by checkinng the existance of the object before adding it, but can't find a way.

help!

shx2
  • 61,779
  • 13
  • 130
  • 153
Oussama L.
  • 1,842
  • 6
  • 25
  • 31

2 Answers2

1

The term for the operation you're describing is "upsert".

In mongodb, the way to upsert is to use the update functionality with upsert=True.

shx2
  • 61,779
  • 13
  • 130
  • 153
0

You can index on one or more fields(not _id) of the document/xml structure. Then make use of find operator to check if a document containing that indexed_field:value is present in the collection. If it returns nothing then you can insert new documents into your collection. This will ensure only new docs are inserted when you re-run the script.

vmr
  • 1,895
  • 13
  • 24
  • actualy i found out something like db.collection.find(object).limit(1) returns the first object it finds matching "object", i can't use $exist like that : db.collection.find(o, {$exist:true}), can i? – Oussama L. Sep 12 '14 at 11:51
  • Apologies for the confusion caused, "$exists" does not make sens in this scenario. Just use "find" but on an indexed field so that the query is faster. – vmr Sep 12 '14 at 11:56