2

I'm trying to group a large data set that matches a query into 10 minute "time slices". I am wondering if there is any way to do this efficiently with mongodb's aggregation functions.

I have a document that looks like:

{
    _id: ObjectID(""),
    groupID: '1234',
    name: 'dataPointName',
    timestamp: 1432765200000,
    value: 1234
}

I would then like to group an array of [timestamp,value] pairs in the 10 minute group interval called "grouped_data". I am wondering if there is an efficient way to perform all of these operations?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
user2356226
  • 37
  • 1
  • 5
  • 1
    To group by 10 min intervals - if your timestamps are milliseconds (long), then you should be able to just divide them by (1000 * 60 * 10) using the `$divide` [aggregation operator](http://docs.mongodb.org/manual/reference/operator/aggregation/) – DNA May 28 '15 at 18:09
  • 1
    Please refer to this very similar question with a great answer, http://stackoverflow.com/questions/26814427/group-result-by-15-minutes-time-interval-in-mongodb. Though this is applied on a `ISODate` field, the logic can be reused. – BatScream May 28 '15 at 18:20

2 Answers2

5

You could start by projecting a new date field that you can then group by the interval.

Suppose you have the following test documents:

db.collection.insert([
    {
        groupID: '1234',
        name: 'dataPointName',
        timestamp: 1432765200000,
        value: 1234
    },
    {
        groupID: '1234',
        name: 'dataPointName',
        timestamp: 1432765300000,
        value: 1234
    },
    {
        groupID: '1234',
        name: 'dataPointName',
        timestamp: 1432766100000,
        value: 1234
    }
])

You can then implement the following aggregation:

db.collection.aggregate([
    {
        "$project": {
            "date": { "$add": [new Date(0), "$timestamp"] },
            "timestamp": 1,
            "value": 1
        }
    },
    { 
        "$group": {
            "_id": {
                "year": { "$year": "$date" },
                "dayOfYear": { "$dayOfYear": "$date" },
                "interval": {
                    "$subtract": [ 
                        { "$minute": "$date" },
                        { "$mod": [{ "$minute": "$date"}, 10 ] }
                    ]
                }
            },
            "grouped_data": { "$push": {"timestamp": "$timestamp", "value": "$value" } }
        }
    },
    {
        "$project":{
            "_id": 0,
            "grouped_data": 1
        }
    }
])

Output:

/* 0 */
{
    "result" : [ 
        {
            "grouped_data" : [ 
                {
                    "timestamp" : 1432766100000,
                    "value" : 1234
                }
            ]
        }, 
        {
            "grouped_data" : [ 
                {
                    "timestamp" : 1432765200000,
                    "value" : 1234
                }, 
                {
                    "timestamp" : 1432765300000,
                    "value" : 1234
                }
            ]
        }
    ],
    "ok" : 1
}

-- EDIT --

To format the data as array like [timestamp,value] rather than a key/value array, you could use the forEach() method of the aggregate cursor as follows:

var result = [];   
db.collection.aggregate(pipeline).forEach(function (doc){    
    data = []; 
    doc.grouped_data.forEach(function (obj){               
        data.push(obj.timestamp);
        data.push(obj.value);        
    });
    result.push(data);
})

printjson(result);

Output

[
        [
                1432766100000,
                1234
        ],
        [
                1432765200000,
                1234,
                1432765300000,
                1234
        ]
]
chridam
  • 100,957
  • 23
  • 236
  • 235
  • 1
    This worked perfect ! Exactly what I was hoping for! Is there anyway I can format the data as array like [timestamp,value] rather than a key/value array? – user2356226 May 28 '15 at 20:26
  • 1
    @user2356226 Yes indeed, you can also use the aggregate results cursor to iterate over the objects and push the values to an array, something like the edit above. – chridam May 29 '15 at 06:50
  • Could you please help on this one? https://stackoverflow.com/questions/61131299/mongodb-datetostring-format-to-show-time-by-15min-interval – newdeveloper Apr 10 '20 at 00:15
2

This is clearly an issue that can be solved easily using map-reduce. Your key here will be timestamp / (10*60*1000) round to the greatest lower integer. And you only need to aggregate your grouped_data in the reduce step.

However, this is a little bit more complicated as I assume you need to keep you values ordered by timestamp (remember the reduce function should be commutative). To help here, I will use a finalizer to sort the results.

map = function() {
  window_width = 10*60*1000
  emit(Math.floor(this.timestamp/window_width),
       { grouped_data: [[ this.timestamp, this.value]] })
}

// aggregates values by concatenating the [[timestamp, values]] arrays
// don't bother sorting here as this will be done by the finalizer
reduce = function(key, values) {
  return values.reduce(
            function(a,b) { return { grouped_data: a.grouped_data.concat(b.grouped_data)} }
      )
} 

// Sort data in each array by timestamp
finalize = function(key, reducedValue) {
  return { grouped_data: reducedValue.grouped_data.sort(function(a,b) { a[0] - b[0] }) }
}

Producing (with some dummy dataset):

> db.w.mapReduce(map, reduce, { finalize: finalize, out: {inline:1}}).results
[
    {
        "_id" : 2387925,
        "value" : {
            "grouped_data" : [
                [
                    1432755300001,
                    1234
                ],
                [
                    1432755300000,
                    1234
                ]
            ]
        }
    },
    {
        "_id" : 2387942,
        "value" : {
            "grouped_data" : [
                [
                    1432765200000,
                    1234
                ],
                [
                    1432765200001,
                    1234
                ],
                [
                    1432765300000,
                    1234
                ],
                [
                    1432765300001,
                    1234
                ]
            ]
        }
    }
]
Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125