0

I have documents that look like this

{
    "field1" : "value",
    "field3" : "value",
    "attributes" : {
        "key1" : {
            "category" : "4",
            "value" : "value"
        },
        "key2" : {
            "category" : "5",
            "value" : "value"
        },
    }
}

I want to transform these to look like the structure below (basically it's taking a dictionary and flattening it into an array). Note the number of keys in attributes varies.

{
    "field1" : "value",
    "field3" : "value",
    "attributes" : [
        {
            "key" : "key1",
            "category" : "4",
            "value" : "value"
        },
        {
            "key" : "key2",
            "category" : "5",
            "value" : "value"
        },
    ]
}

Because the number of documents is significant, I would like to use the bulk APIs. I have this basic structure to my script.

db = db.getSiblingDB('MyDB');

db.MyCollection.intializeOrderedBukOp(),
var bulk = db.MyCollection.intializeOrderedBukOp(),
    count = 0;

db.MyCollection.find({}).forEach(function(doc) {
   // Re-map attrs (what follows is some pseudocode
   var attrs = function TransformSomehow(doc.attributes) {
           // return an array...
       };
   });

   // Queue update operation
   bulk.find({ "_id": doc._id })
       .updateOne({ 
           "$set": { "attrs": attrs } }
       });
   count++;

   // Execute every 1000
   if ( count % 1000 == 0 ) {
       bulk.execute();
       bulk = db.collection.intializeOrderedBukOp();
   }
});

// Drain any queued remaining
if ( count % 1000 != 0 )
    bulk.execute();

This was borrowed from this answer, but I can't use it as is because it's going the opposite direction (array to object). It appears that I can't use map to do this.

Kit
  • 20,354
  • 4
  • 60
  • 103

2 Answers2

1

This is one way to do bulk write using the MongoDB NodeJS driver v3.6 and MongoDB v4.2.3. This uses a simple batch to update all the collection documents. The update transforms the object (or dictionary) into an array (as per your requirement) for each document.

const collection = db.collection('test');

const update1 = { 
    updateMany: { 
        filter: { }, 
        update: [ 
            { $set: { 
                 attributes: { 
                     $map: { 
                         input: { $objectToArray: "$attributes" }, 
                         in: { $mergeObjects: [ { "key": "$$this.k" }, "$$this.v" ] } 
                     } 
                 } 
            } } 
        ] 
    } 
};

collection.bulkWrite( [ update1 ],
                      { ordered: false },
                      ( err, result ) => {
                          console.log( 'Updated and modified count: ', result.matchedCount, result.modifiedCount );
                       }
)


Reference: CRUD Operations - Bulk Write

prasad_
  • 12,755
  • 2
  • 24
  • 36
0

@prasad_'s answer works. For those that want to apply this in the MongoDB console without NodeJS, I translated his solution to this.

col = db.getCollection('X');

update1 = { 
    updateMany: { 
        filter: { }, 
        update: [ 
            { $set: { 
                 attributes: { 
                     $map: { 
                         input: { $objectToArray: "$attributes" }, 
                         in: { $mergeObjects: [ { "key": "$$this.k" }, "$$this.v" ] } 
                     } 
                 } 
            } } 
        ] 
    } 
}

col.bulkWrite( [ update1 ], { ordered: false })
Kit
  • 20,354
  • 4
  • 60
  • 103