14

I have two collections:

  1. Users
  2. Uploads


Each upload has a User associated with it and I need to know their details when an Upload is viewed. Is it best practice to duplicate this data inside the the Uploads record, or use populate() to pull in these details from the Users collection referenced by _id?


OPTION 1

var UploadSchema = new Schema({
    _id: { type: Schema.ObjectId },
    _user: { type: Schema.ObjectId, ref: 'users'},
    title: { type: String },
});


OPTION 2

var UploadSchema = new Schema({
    _id: { type: Schema.ObjectId },
    user: { 
           name: { type: String },
           email: { type: String },
           avatar: { type: String },
           //...etc
          },
    title: { type: String },
});


With 'Option 2' if any of the data in the Users collection changes I will have to update this across all associated Upload records. With 'Option 1' on the other hand I can just chill out and let populate() ensure the latest User data is always shown.

Is the overhead of using populate() significant? What is the best practice in this common scenario?

Community
  • 1
  • 1
wilsonpage
  • 17,341
  • 23
  • 103
  • 147

2 Answers2

17

If You need to query on your Users, keep users alone. If You need to query on your uploads, keep uploads alone.

Another question you should ask yourself is: Every time i need this data, do I need the embedded objects (and vice-versa)? How many time this data will be updated? How many times this data will be read?

Think about a friendship request: Each time you need the request you need the user which made the request, then embed the request inside the user document.

You will be able to create an index on the embedded object too, and your search will be mono query / fast / consistent.


Just a link to my previous reply on a similar question: Mongo DB relations between objects

I think this post will be right for you http://www.mongodb.org/display/DOCS/Schema+Design

Use Cases

Customer / Order / Order Line-Item

Orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.

Blogging system.

Posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.

Schema Design Basics

Kyle Banker, 10gen

http://www.10gen.com/presentation/mongosf2011/schemabasics

Indexing & Query Optimization Alvin Richards, Senior Director of Enterprise Engineering

http://www.10gen.com/presentation/mongosf-2011/mongodb-indexing-query-optimization

**These 2 videos are the bests on mongoddb ever seen imho*

kilianc
  • 7,397
  • 3
  • 26
  • 37
  • So how does one go about updating the duplicate date? In the blogging example if we are duplicating come of the users information (eg. name) in the comment record for quick mono quieries. If that user changes their name we need to update every instance of their name across the database, in this case every comment they have made. – wilsonpage Nov 10 '11 at 09:24
  • you need to ask yourself, how many times do you will change your username? maybe a mass update will make at you case, or simply make an ObjectId to the user document. Object embedding is an option, not an mongodb dojo. – kilianc Nov 12 '11 at 20:59
3

Populate() is just a query. So the overhead is whatever the query is, which is a find() on your model. Also, best practice for MongoDB is to embed what you can. It will result in a faster query. It sounds like you'd be duplicating a ton of data though, which puts relations(linking) at a good spot.

"Linking" is just putting an ObjectId in a field from another model.

Here is the Mongo Best Practices http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-SummaryofBestPractices

Linking/DBRefs http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-SimpleDirect%2FManualLinking

Chris Biscardi
  • 3,148
  • 2
  • 20
  • 19
  • My schema is more complicated than this, I just simplified it for this example. Users can be members of multiple groups, the Uploads belong to the group rather that the user. You answer doesn't answer my question either "Is the overhead of using populate() significant? What is the best practice in this common scenario?" – wilsonpage Nov 04 '11 at 09:38
  • Updated my answer to be more relevant. – Chris Biscardi Nov 04 '11 at 10:41