0

So I have a MongoDB document that tracks logons to our app. Basic structure appears thusly:

 [_id] => MongoId Object
        (
            [$id] => 50f6da28686ba94b49000003
        )

    [userId] => 50ef542a686ba95971000004
    [action] => login
    [time] => 1358354984

Now- the challenge is this: there are about 20,000 of these entries. I have been challenged to look at the number of times each user logged in (as defined by userId)...so I am looking for a good way to do this. There are a couple of possible approaches that I've seen (in SQL, for example, I might pull down number of logins by grouping by UserID and doing a count on it- something like SELECT userID, count(*) from....group by UserId...and then sub-selecting on that (CASE WHEN or something in the top select). Anyways- wondering if anyone has any suggestions on the best way to do this. Worst case scenario I can limit the result set and do the grouping in memory- but ideally would like to get the full answer directly from Mongo. The other limitation (even after I get past the first set) is that I am looking to do a unique count by date...which will be even tougher!

ekad
  • 14,436
  • 26
  • 44
  • 46

3 Answers3

2

Now- the challenge is this: there are about 20,000 of these entries.

At 20,000 you will probably be better off with the aggregation framework ( http://docs.mongodb.org/manual/applications/aggregation/ ):

$db->user->aggregate(array(
    array( '$group' => array( '_id' => '$userId', 'num_logins' => array( '$sum' => 1 ) ) )
));

That will group ( http://docs.mongodb.org/manual/reference/aggregation/#_S_group ) by userId and count (sum: http://docs.mongodb.org/manual/reference/aggregation/sum/#_S_sum ) the amount of grouped login there are.

Note: As stated in the comments, the aggregate helper is in version 1.3+ of the PHP driver. Before version 1.3 you must use the command function directly.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • Keep in mind this is only supported in recent versions of the mongodb driver. – datasage Jan 31 '13 at 19:33
  • Yeah- I tried to upgrade to 1.3 for the install driver- but without setSlaveOkay() I am struggling- so I reverted. Might be time to figure that out! – Fernando Pombeiro Jan 31 '13 at 19:44
  • @FernandoPombeiro Slaveokay should be set via the read preferences now http://php.net/manual/en/mongo.readpreferences.php you wanna use: `RP_SECONDARY` – Sammaye Jan 31 '13 at 20:08
  • Yeah- having issues with RP_SECONDARY...getting the old "Not master" error...need to work through that at some point :-) – Fernando Pombeiro Jan 31 '13 at 20:26
  • @FernandoPombeiro poop well in the meantime you can always run this through the command as shown here: http://stackoverflow.com/questions/11290809/mongodb-aggregation-php – Sammaye Jan 31 '13 at 20:34
0

You can use MapReduce to group the results by user ID

http://docs.mongodb.org/manual/applications/map-reduce/#map-reduce-examples

Or you can use the Group method:

db.logins.aggregate(
    { $group : {
        _id : "$userId",
        loginsPerUser : { $sum : 1 }
    }}
);

For MongoDB 20K or even more won't be a problem to walk and combine them so no worries about performance.

Alex Rashkov
  • 9,833
  • 3
  • 32
  • 58
  • Yeah- I thought about that...but I am getting killed on timeout (there are over 20,000 keys- which means that the PHP group function is nixed). – Fernando Pombeiro Jan 31 '13 at 19:33
0

http://docs.mongodb.org/manual/reference/command/group/

db.user.group({key: {userId: 1}, $reduce: function ( curr, result ) { result.total++ }, initial: {total: 0}});

I ran this on 191000 rows in just a couple seconds but group is limited to 20,000 unique entries so it really isn't a solution for you.

John McKnight
  • 684
  • 4
  • 9