Converting a MongoDB aggregate into an ArangoDB COLLECT

Question

I'm migrating data from Mongo to Arango and I need to reproduce a $group aggregation. I have successfully reproduced the results but I'm concerned that my approach maybe sub-optimal. Can the AQL be improved?

I have a collection of data that looks like this:

{
    "_id" : ObjectId("5b17f9d85b2c1998598f054e"),
    "department" : [ 
        "Sales", 
        "Marketing"
    ],
    "region" : [ 
        "US", 
        "UK"
    ]
}

{
    "_id" : ObjectId("5b1808145b2c1998598f054f"),
    "department" : [ 
        "Sales", 
        "Marketing"
    ],
    "region" : [ 
        "US", 
        "UK"
    ]
}

{
    "_id" : ObjectId("5b18083c5b2c1998598f0550"),
    "department" : "Development",
    "region" : "Europe"
}

{
    "_id" : ObjectId("5b1809a75b2c1998598f0551"),
    "department" : "Sales"
}

Note the value can be a string, Array or not present

In Mongo I'm using the following code to aggregate the data:

db.test.aggregate([
{
    $unwind:{
        path:"$department",
        preserveNullAndEmptyArrays: true
    }
},
{
    $unwind:{
        path:"$region",
        preserveNullAndEmptyArrays: true
    }
},
{
    $group:{
        _id:{
            department:{ $ifNull: [ "$department", "null" ] },
            region:{ $ifNull: [ "$region", "null" ] },
        },
        count:{$sum:1}
    }
}
])

In Arango I'm using the following AQL:

FOR i IN test
    LET FIELD1=(FOR a IN APPEND([],NOT_NULL(i.department,"null")) RETURN a)
    LET FIELD2=(FOR a IN APPEND([],NOT_NULL(i.region,"null")) RETURN a)

    FOR f1 IN FIELD1
        FOR f2 IN FIELD2
            COLLECT id={department:f1,region:f2} WITH COUNT INTO counter

            RETURN {_id:id,count:counter}

Edit: The APPEND is used to convert string values into an Array

Both produce results that look like this;

{
    "_id" : {
        "department" : "Marketing",
        "region" : "US"
    },
    "count" : 2.0
}

{
    "_id" : {
        "department" : "Development",
        "region" : "Europe"
    },
    "count" : 1.0
}

{
    "_id" : {
        "department" : "Sales",
        "region" : "null"
    },
    "count" : 1.0
}

{
    "_id" : {
        "department" : "Marketing",
        "region" : "UK"
    },
    "count" : 2.0
}

{
    "_id" : {
        "department" : "Sales",
        "region" : "UK"
    },
    "count" : 2.0
}

{
    "_id" : {
        "department" : "Sales",
        "region" : "US"
    },
    "count" : 2.0
}

CodeManX · Accepted Answer · 2018-06-08T10:42:23.417

Your approach seems alright. I would suggest to use TO_ARRAY() instead of APPEND() to make it easier to understand though.

Both functions skip null values, thus it is unavoidable to provide some placeholder, or test for null explicitly and return an array with a null value (or whatever works best for you):

FOR doc IN test
    FOR field1 IN doc.department == null ? [ null ] : TO_ARRAY(doc.department)
    FOR field2 IN doc.region == null ? [ null ] : TO_ARRAY(doc.region)
    COLLECT department = field1, region = field2
    WITH COUNT INTO count
        RETURN { _id: { department, region }, count }

Collection test:

[
  {
    "_key": "5b17f9d85b2c1998598f054e",
    "department": [
      "Sales",
      "Marketing"
    ],
    "region": [
      "US",
      "UK"
    ]
  },
  {
    "_key": "5b18083c5b2c1998598f0550",
    "department": "Development",
    "region": "Europe"
  },
  {
    "_key": "5b1808145b2c1998598f054f",
    "department": [
      "Sales",
      "Marketing"
    ],
    "region": [
      "US",
      "UK"
    ]
  },
  {
    "_key": "5b1809a75b2c1998598f0551",
    "department": "Sales"
  }
]

Result:

[
  {
    "_id": {
      "department": "Development",
      "region": "Europe"
    },
    "count": 1
  },
  {
    "_id": {
      "department": "Marketing",
      "region": "UK"
    },
    "count": 2
  },
  {
    "_id": {
      "department": "Marketing",
      "region": "US"
    },
    "count": 2
  },
  {
    "_id": {
      "department": "Sales",
      "region": null
    },
    "count": 1
  },
  {
    "_id": {
      "department": "Sales",
      "region": "UK"
    },
    "count": 2
  },
  {
    "_id": {
      "department": "Sales",
      "region": "US"
    },
    "count": 2
  }
]

Thanks, it's nice to know I was on the right track. Your version was much cleaner though! — Brendan Wheble, Jun 08 '18 at 08:50

Converting a MongoDB aggregate into an ArangoDB COLLECT

1 Answers1