0

I have documents like this one at collection x at MongoDB:

{
    "_id" : ...
    "attrKeys": [ "A1", "A2" ],
    "attrs" : {
        "A1" : {
            "type" : "T1",
            "value" : "13"
        },
        "A2" : {
            "type" : "T2",
            "value" : "14"
        }
    }
}

The A1 and A2 elements above are just examples: the attrs field may hold any number of keys of any name. The key names in attrs are stored in the attrNames field.

I would like to use the MongoDB aggregation framework to transform that document into one like this:

{
    "_id" : ...
    "attrs" : [
        {   
            "key": "A1",
            "type" : "T1",
            "value" : "13"
        },
        {   
            "key": "A2",
            "type" : "T2",
            "value" : "14"
        }
    ]
}

That is, to become attrs into an array, which elements are the same that the key values "passing" the key into a new field inside each array element of name key.

It is possible use the aggregation framework for suck transformation? I tend to think that $project operator could be used, but I haven't figured out how.

DaBler
  • 2,695
  • 2
  • 26
  • 46
fgalan
  • 11,732
  • 9
  • 46
  • 89
  • 1
    Having unknown keys is a dangerous anti-pattern in MongoDB. – Philipp Apr 21 '15 at 09:13
  • A little bit offtopic, but I'd try to clarify :) Actually, the key is `attrs` and it is perfectly known. Another issue is that I need using keys inside `attrs` in order to process concurrent updates on attributes, e.g. `{$set: {attrs.A1: {...}}}` and `{$set: {attrs.A2: {...}}}` which could be very difficult to manage storing attributes as an array. – fgalan Apr 21 '15 at 09:38
  • Actually, keys are not completelly unknown, they are stored in the `attrKeys` field. I have edited the question to include that information (I haven't included in my original post thinking `attrKeys` was not meaniful to solve the question, sorry for that). – fgalan Apr 21 '15 at 09:44

2 Answers2

1

As @Philipp rightly mentioned in his comments

Having unknown keys is a dangerous anti-pattern in MongoDB

However, if you knew beforehand what the keys are then you could use the aggregation operators $literal, $addToSet and $setUnion to get the desired result. The aggregation pipeline would be like:

db.collection.aggregate([
    {
        "$project": {

            "attrs.A1.key": { "$literal": "A1" },
            "attrs.A1.type": "$attrs.A1.type",
            "attrs.A1.value": "$attrs.A1.value",
            "attrs.A2.key": { "$literal": "A2" },
            "attrs.A2.type": "$attrs.A2.type",
            "attrs.A2.value": "$attrs.A2.value"
        }
    },
    {
        "$group": {
            "_id": "$_id",
            "A1": { "$addToSet": "$attrs.A1" },
            "A2": { "$addToSet": "$attrs.A2" }
        }
    },
    {
        "$project": {
            "attrs": {
                "$setUnion": [ "$A1", "$A2" ]
            }
        }
    }
])

Result:

/* 0 */
{
    "result" : [ 
        {
            "_id" : ObjectId("55361320180e849972938fea"),
            "attrs" : [ 
                {
                    "type" : "T1",
                    "value" : "13",
                    "key" : "A1"
                }, 
                {
                    "type" : "T2",
                    "value" : "14",
                    "key" : "A2"
                }
            ]
        }
    ],
    "ok" : 1
}
chridam
  • 100,957
  • 23
  • 236
  • 235
  • Keys are known beforehand in the `attrKeys` field (I haved updated my question post to include it, sorry for not doing in my original post). – fgalan Apr 21 '15 at 09:46
  • 1
    They are not known beforehand - they are determined at the time you touch the document and can see `attrKeys`. – wdberkeley Apr 27 '15 at 16:38
1

The aggregation framework is not how you handle the transformation here. You might have been looking to the $out operator to be of some help when re-writing your collection, but the aggregation framework cannot do what you are asking.

Basically the aggregation framework lacks the means to access "keys" dynamically by using a "data point" in any way. You can process data like you have with mapReduce, but it is generally not as efficient as using the aggregation framework and mostly why you seem to be here in the first place, since someone pointed out the revised structure is better.

Also, trying to use mapReduce as a way to "re-shape" your collection for storage is generally not a good idea. MapReduce output is essentially "always" "key/value", which means the output you get is always going to be contained under an mandatory "value" field.

This really means changing the contents of the collection, and the only way you can really do that while using the values present in you document is by "reading" the document content and then "writing" back.

The looping nature of this is best handled using the "Bulk" operations API methods

db.collection.intializeOrderedBukOp(),
var bulk = db.collection.intializeOrderedBukOp(),
    count = 0;

db.collection.find({ "attrKeys": { "$exists": true }}).forEach(function(doc) {
   // Re-map attrs
   var attrs = doc.attrKeys.map(function(key) {
       return {
           "key": key,
           "type": doc.attrs[key].type,
           "value": parseInt(doc.attrs[key].value)
       };
   });

   // Queue update operation
   bulk.find({ "_id": doc._id, "attrKeys": { "$exists": true } })
       .updateOne({ 
           "$set": { "attrs": attrs },
           "$unset": { "attrKeys": 1 }
       });
   count++;

   // Execute every 1000
   if ( count % 1000 == 0 ) {
       bulk.execute();
       bulk = db.collection.intializeOrderedBukOp();
   }
});

// Drain any queued remaining
if ( count % 1000 != 0 )
    bulk.execute();

Once you have updated the collection content ( and please note that your "value" fields there have also been changed from "string" to "integer" format ) then you can do useful aggregation operations on your new structure, such as:

db.collection.aggregate([
    { "$unwind": "$attrs" },
    { "$group": {
        "_id": null,
       "avgValue": { "$avg": "$attrs.value" }
    }}
])
Kit
  • 20,354
  • 4
  • 60
  • 103
Blakes Seven
  • 49,422
  • 14
  • 129
  • 135