I am relatively new to MongoDB, and so far am really impressed. I am struggling with the best way to setup my document stores though. I am trying to do some summary analytics using twitter data and I am not sure whether to put the tweets into the user document, or to keep those as a separate collection. It seems like putting the tweets inside the user model would quickly hit the limit with regards to size. If that is the case then what is a good way to be able to run MapReduce across a group of user's tweets?
I hope I am not being too vague but I don't want to get too specific and too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB land where I would lay out my schema like
| USER |
--------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like the logical schema in Mongo would be
User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID
but wouldn't that quickly bloat the User document beyond capacity. But I would like to run analysis on tweets belonging to users with similar somegroupID. It conceptually makes sense to to the model layout as above, but at what point is that too unweildy? And what are viable alternatives?