Collection data modeling in mongoDB

Question

I want to design a model for profiles interaction, for example A <-> interact <-> B , the interaction contains common fields for A and B.
Lets say I have collection called Interactions, I have few thoughts in mind and I am looking for the best practice solution.

separate the interaction to two different documents, one for each profile
```
{
  pid:"A ID"
  commonField1:""
  commonField2:""
  ..

}
{
  pid:"B ID"
  commonField1:""
  commonField2:""
  ..
}
```
pros: fast read
cons: each update for the common field should be performed on both of the documents
maintain one document for the interaction
```
{
 pids:['A ID','B ID']
 commonField1:""
 commonField2:""
 ..
}
```
pros: update the common field only once
cons: tricky read

The thing is there are a lot of reading but also a lot of updating and this collection should be designed for a lot of millions of documents.

common queries in my scenarios:

retrieve profile interactions
update specific profile interaction

I am leaning to the second choice where I will be relying on Multikey index on the pids for fast document lookup and I will be enjoying in single update in each frequent change.

I have no experience in sharded collections but I have noticed Multikey index is not supported as sharding key, should it be a show stopper for the second choice?
does the reads will be fast enough with that kind of index? and are they any other choices for my use case?

your answer is highly appreciated.

score 0 · Answer 1 · answered Mar 24 '14 at 02:50

I think the latter format makes more sense to avoid duplication of updates.

For the interaction pairs you should be using a compound index rather than an array. A compound index can be be used for _id and as a shard key (arrays are not valid for either).

So the document might look something like:

{
    _id: { pid1: 'A', pid2: 'B' },
    commonField1: '',
    commonField2: '',
}

If you want to avoid duplicate pairs you could sort your IDs in some predictable order. For example, pid1 might always be the lesser of the two values.

The default _id index will allow you to efficiently look up either (pid1,pid2) or (pid1) interactions, but you'll probably want to add an extra index on {'_id.pid2': 1}.

Collection data modeling in mongoDB

1 Answers1