2

I'm using MongoDB for my database. The query that I'm currently working on revealed a possible deficiency in my schema. Below is the relevant layout of my collections. Note that games.players is an array of 2 players since the game is chess.

users {_id, username, ...}
games {_id, players[], ...}
msgs {_id, username, gameid, time, msg}

The data that I need is:

All msgs for games which a user is in which is newer than a given timestamp.

In a SQL database, my query would look similar to:

SELECT * FROM msgs WHERE time>=$time AND gameid IN
    (SELECT _id FROM games WHERE players=$username);

But, Mongo isn't a relational database, so doesn't support sub-queries or joins. I see two possible solutions. What would be better performance-wise and efficiency-wise?

  1. Multiple Queries
    • Select games the user is in, then use $in to match msgs.gameid by.
    • Other?
  2. Normalization
    • Make users.games contain all games a user is in.
    • Copy games.players to msgs.players by msgs.gameid
    • etc.,
Justin
  • 555
  • 4
  • 22

2 Answers2

2

I'm a relative newbie to MongoDB, but I find my self frequently using a combination of the two approaches. Some things - e.g. user names - are frequently duplicated to simplify queries used for display, but any time I need to do more than display information, I wind up writing multiple queries, sometimes 2 or 3 levels deep, using $in, to gather all the documents I need to work with for a given operation.

Chris Nitchie
  • 537
  • 5
  • 9
0

You can "normalize" yourself. I would add an array to users that list the games he is a member of;

users {_id, username, games={game1,game2,game3}}

now you can do a query on msgs where the time>time$ and the {games._id "is in" users.games}

You will have to maintain the games list on each user.

Michael Gray
  • 611
  • 1
  • 5
  • 13
  • That would work, but that means users.games grows linearly. You made me think of the reverse, copy the games.players into msgs. Then it's just match by time and user. – Justin Sep 13 '11 at 02:27
  • except that there is only one record per user in users. Msgs has a document per msg, so coping games.player puts a large subobject in each msg (lots more data). Plus the members of game may change after the message is sent. Storing the "games" inside "user" means you don't really need to store players inside games. The "games" field inside the user can be indexed. – Michael Gray Sep 16 '11 at 09:40
  • Ok, I should've clarified, games.player is a constant of size 2 since this is for a chess game. Would this change your recommendation? I think your method requires 2 queries because there's no join, but mine only requires 1. – Justin Sep 16 '11 at 21:34