2

I am trying to do a bulk insert in MongoDB using PyMongo. I have millions of product/review documents to insert into MongoDB. Here is the structure of the document:

{
    "_id" : ObjectId("553858a14483e94d1e563ce9"),
    "product_id" : "B000GIKZ4W",
    "product_category" : "Arts",
    "product_brand" : "unknown",
    "reviews" : [
        {
            "date" : ISODate("2012-01-09T00:00:00Z"),
            "score" : 3,
            "user_id" : "A3DLA3S8QKLBNW",
            "sentiment" : 0.2517857142857143,
            "text" : "The ink was pretty dried up upon arrival. It was...",
            "user_gender" : "male",
            "voted_total" : 0,
            "voted_helpful" : 0,
            "user_name" : "womans_roar \"rohrra\"",
            "summary" : "Cute stamps but came with dried up ink"
        }
    ],
    "product_price" : "9.43",
    "product_title" : "Melissa & Doug Deluxe Wooden Happy Handle Stamp Set"
} 

There can be multiple reviews for a single product. The requirement is to insert one document per product_id and keep appending more reviews as subdocument in the reviews array. Can you please provide some pointers on how this can be achieved? Also, will be nice to do implement bulk insert for performance.

vaultah
  • 44,105
  • 12
  • 114
  • 143
Randeep
  • 99
  • 1
  • 1
  • 4

1 Answers1

1

will be nice to do implement bulk insert for performance.

In pymongo can execute Ordered bulk write operations or Unordered Bulk Write Operations

The requirement is to insert one document per product_id and keep appending more reviews as subdocument in the reviews array

You can use the update_one or update_many (Pymongo 3 or newer) or update method to $push subdocument to the reviews array

collection.update_one({"_id": <doc_id>}, {"$push": {"reviews": <subdocument>}})

or

collection.update({"_id": <doc_id>}, {"$push": {"reviews": <subdocument>}})

To insert need document if no document matches the given criteria use upsert option

collection.update({"_id": <doc_id>}, {"$push": {"reviews": <subdocument>}}, upsert=True)
styvane
  • 59,869
  • 19
  • 150
  • 156
  • Thanks for your reply. The modified_count from UpdateResult object returned is None and it does not insert any data into MongoDB. I have to do an upsert where if product_id is not found then I add the whole product document, else I just append the reviews part (subdocument) to the existing product document. Please suggest a way in which this can be achieved. – Randeep Apr 23 '15 at 07:29
  • @Randeep to insert new document if `product_id` not found use `upsert=True`. Edited my answer – styvane Apr 23 '15 at 07:36
  • Yep, I got that figured out now. Its working :) Thank you for your timely response. – Randeep Apr 23 '15 at 07:44
  • How does this work with multiple reviews being given? Doesn't this only add 1 subdocument? – Miguel Stevens Oct 25 '18 at 18:42
  • I you want to add multiple reviews, then you also need to use the [`$each`](https://docs.mongodb.com/manual/reference/operator/update/each/) operator. Otherwise it will add the array to the reviews array instead of extending it – styvane Oct 26 '18 at 14:32