0

I have a mongodb into which multiple sensors dump their data once a day to a mongodb. Each document in essense is: { sid , date, data } (sensor_id, date as date (I only use the date component), and a data array of a couple hundred values.

Now I want to be able to get a overview statistic, for how many sensors I have data for each day. This aggegation works nicely, while I have a few dozens of elements, but even if I have a couple of hundred documents, then the query never finishes.

function dailyStatistic(callback) {
    return air
       .aggregate( [
           { $match: {} }, 
           { $group: {  _id: { date: '$date' },  myCount: { $sum: 1 } } }
       ])
       .allowDiskUse(true);
}

air is the name of my mongoose collection.

The aggregation should really just return:

[ {date:2017-08-07, myCount: 10}, {date:2017-08-08}, myCount: 26} ]

Now when I watch the machine (via glances) I get CPU_IOWAIT and MEMSWAP errrors, that ultimately will kill the node.js process before it gets the data.

When I check out the collection on robomongo, I can easily browse the different data points. But also in robomongo, this script never gets me a result:

db.getCollection('air').find({}).length()

Any ideas? Thanks Andreas

robertklep
  • 198,204
  • 35
  • 394
  • 381
awb99
  • 11
  • 1
  • 5
  • 1
    Do you have an index on date ? Also $match is useless – tronic Aug 12 '17 at 07:22
  • I dont think it's index issue. Mongo can work nicely with millions of documents even without index, it will take longer time, but still. What amount of RAM you have in your machine ? I would suggest take a look at mongo configs and if there everything ok, look at connection string. – Mykola Borysyuk Aug 12 '17 at 08:16
  • I have 8 GB RAM. I n Mongodb configuration is nothing unusual; I could not find any option to limit RAM / cache consumption in mongodb though,. – awb99 Aug 14 '17 at 06:11
  • I believe it has to do with the size of each document. I will start reducting the size of the documents until it works, and comment further. – awb99 Aug 14 '17 at 11:33

2 Answers2

0

Probably you do not have an index on date db.getCollection('air').createIndex({date:1})

db.getCollection('air').find({}).length() browse all the results

Instead uses db.getCollection('air').count({})

tronic
  • 459
  • 2
  • 11
  • count works! Thanks! I have exactly 3500 documents in the collection. Even the find.length() function gets killed before returning the number. – awb99 Aug 14 '17 at 06:03
0

The best way to do this without crashing MongoDb would be to fetch data for a date range. In your case for 1 day.

function dailyStatistic(dateMin,dateMax,callback) {
return air
    .aggregate( [
        { $match: {
            date:{$gte:dateMin,$lte:dateMax}} },
        {
            $project:{
                sid:1,
                date:1,
                data:1,
                day: {$day: "$date"},
                month: {$month: "$date"},
                year: {$year: "$date"}
            }
        },
        { $group: {  _id: {day: "$day",month: "$month", year: "$year"},  myCount: { $sum: 1 } } }
    ])
    .allowDiskUse(true);}

You can take this further by adding pagination when the records available per hour/min is also too huge.

And as pagetronic suggested, create the indexes if you haven't.

Naveen Paul
  • 454
  • 8
  • 18
  • Thanks! My collection only has 3500 documents. I think it should really aggregate those without any index or pagination. – awb99 Aug 14 '17 at 06:02
  • When your data grows, indexing makes a lot of difference. But did you find a solution to your issue? – Naveen Paul Aug 14 '17 at 09:13
  • Hi Paul! No, my issue is still unsolved. :-( I have changed the datatype of date from "Date" to Number (using unix integer timestamp now). This seems to have helped a little bit. I also have cleaned up the document JSON structure (removed fields that I no longer need). This also helped a little bit. I guess it has something to do that JSON parsing is cpu and memory intensive. But I am not completely sure. – awb99 Aug 15 '17 at 14:29
  • OK, I have made some changes to the query. For group by date, the grouping will happen by time stamp and you might see the count for that particular time stamp. To overcome this, you will have to 1st get the day,month and the year and then group them. I haven't tested this, but should give you the correct results. – Naveen Paul Aug 15 '17 at 17:22