1

I am solving this problem:

I am building an IMGUR clone, where users can upload images and there is a 'latest uploads' page that shows the last 1000 uploaded images.

  • users can upload pictures as soon as they sign up, but
  • until the user verifies their email address, their uploads do not show up in 'latest uploads'
  • as soon as the user verified their email, their images start showing up.
  • if a user is banned, their images do not show up in 'latest uploads'

Originally I had Images contain a User ref, I would select the last 1000 images populating the User. I would then iterate over the returned collection discarding images owned by banned or non-verified users. This is broken when the last 1000 images were uploaded by unverified users.

I am considering using an array of inner Image documents on the User object, but that is not ideal either because a User might own a lot of Images and I do not always want to load them when I load the User object.

I am open to any solution

mkoryak
  • 57,086
  • 61
  • 201
  • 257
  • I think you'll need a "join" collection that contains the image id, the user id, date, and the status of the user (and a compound index potentially). When the user status changes, update the collection. Do queries off this collection and then using `$in`, grab the details you need from the other collections. – WiredPrairie Jul 23 '13 at 18:59

1 Answers1

1

I would do the following based on what knowledge I have of your application:

There are two entities that should exist in two different collections: user and uploads.

The uploads collection will be very large, so we want to make sure we can index and shard the collection to handle the scale and performance required by your queries. With that said, some key elements in uploads are:

uploads=
{
_id:uploadId
user:{id:userId, emailverified:true, banned:false}
ts:uploadTime
.
.
.
}

possible indexes:

i. {ts:1,banned:1,"user.emailverified":1,"user.banned":1} (this index should be multi-purpose)
ii. {"user.id":1,ts:1}

Note that I store some redundant data to optimize your latest 1000 query. The cost is that in the rare case where emailverified and banned have to be updated, you need to run an update on your user collection as well as your uploads collection (this one will require multi:true).

query:

db.uploads.find({ts:{$gt:sometime},banned:false,emailverified:true}.sort({ts:-1}).limit(1000)
Dylan Tong
  • 647
  • 3
  • 6