0

I am creating an API using PHP and MongoDB. In this system I got users and each user can upload files. Users can also "follow" each other.

I need to return a feed of all the latest files uploaded by users that the authenticated user is following. I am not really sure how to design and execute this.

This is what I am thinking of implementing.

  1. Get all the users that the user is following.
  2. Loop through this array of users and get latest 4 files for each user and add them to an array.
  3. Somehow order these files (this array) by creation date.
  4. Return it.

Is this really the optimal way? Is there a better way? Users are saved in the collection Users and files in the collection files. Following is saved in the collection users.followers.

halfer
  • 19,824
  • 17
  • 99
  • 186
Jonathan Clark
  • 19,726
  • 29
  • 111
  • 175

1 Answers1

3

This is the fan-in vs. fan-out problem. I'd suggest you try fan-out:

Keep a feed collection for your users. When a user uploads a document, insert a new feed item in each of her friends feed item collection. The collection could look like this:

{
    "_id": (some id)
    "UserId": (id of the user who 'owns', i.e. reads this feed)
    "FriendId": (if of the friend who posted the file)
    "FriendName": "John Doe" (name of the fried, denormalized)
    "Timestamp": ...
}

Use a compound index {UserId, Timestamp}.

This approach is write-heavy: If Jane has hundreds of friends, these hundreds of inserts will take their time. On the other hand, uploading a file generally takes a lot of time anyway, so the overhead is negligible, and your reads will be ridiculously simple.

Of course, this can be further optimized with more effort, but it should do fine for quite a bit of traffic.

mnemosyn
  • 45,391
  • 6
  • 76
  • 82
  • Sounds very interesting! When you say it is very write heavy. How long approx. will writing to these feeds take? 2-5 seconds for 500 writes? Faster? If a user is following 1000 users will this feed document get very big and hard do handle then? – Jonathan Clark Nov 17 '11 at 09:29
  • The other way around: If a user has thousands of followers, it will be heavy. How long it really takes depends on a zillion factors, but you should be able to do several thousand inserts per second on commodity hardware. You can also do the fan-out offline: a background tool keeps searching for newly posted files and inserts the news items when it finds such items. Done right, the delay is only a couple of seconds but you greatly reduce load on the web server. – mnemosyn Nov 17 '11 at 09:34
  • There is no problem with a delay for me. So just to see if I understand. I create a collection named feeds and each user got one single document each in feeds collection. Then when a user uploads a file I post metadata about this file in each followers feed document as a sub document? Is this correct? – Jonathan Clark Nov 17 '11 at 09:47
  • Every newsfeed item gets its own document (not just every user), so when a user uploads a file you will have to insert *n* new documents where *n* is the number of followers the user has. No subdocuments at work here. – mnemosyn Nov 17 '11 at 09:52
  • Ah ok. You write "Keep a feed collection for each user." Do you mean a separate collection or a separate document? How many collections can a database hold? – Jonathan Clark Nov 17 '11 at 09:59
  • @mnemosyn, i was going to suggest the same thing.. +1 :) – RameshVel Nov 17 '11 at 10:05
  • Can a database hold 1.000.000 collections? – Jonathan Clark Nov 17 '11 at 10:12
  • Hardly. Each collection needs at least 1.5KB in the namespace, and the namespace file size is limited to 2GB. Like I said, I don't recommend it. The documentation gives more insight: http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections – mnemosyn Nov 17 '11 at 12:58
  • Huh? Ok, I edited my answer... The above solution uses only a single collection; the pseudo-code is more expressive. By 'a collection for each user' I wanted to say that the collection is indexed by `UserId`. Sorry for the confusing wording. – mnemosyn Nov 17 '11 at 13:35