2

I'm trying to learn CouchDB by working through a simple RSS reader web application. The requirements are:

  • Allow each user to import X feeds to his list
  • User can add tags to each feed
  • For each feed maintains a list of the last 50 articles in the database

  • User should get an update each time any feed he subscribes to adds new items to it.

After reading various guides, and Principles for Modeling CouchDB Documents which is a great related question here's how I imagine it'd be structured:

  • Feeds

    • Name
    • Last Updated
  • Articles

    • FeedId
    • Title
    • Text
  • Users

    • id
    • Feeds: [feed1, feed2]
    • Tags: {funny: [article, article2]} //Maybe a new db with #userid #articleid #tagname ?

And then for each user I'd create a view with articles by feed and add the tags to it for presenting it in the ui.

Am I on the right track here? How would you structure this?

Community
  • 1
  • 1
Naren
  • 1,910
  • 1
  • 17
  • 22

2 Answers2

0

I don't think your design is a great fit for CouchDB. In particular, because it seems to me like you'd have to update the user documents a lot (to update article tags, mostly), and the user documents would grow very large over time.

The interactions in the RSS reader model actually make this problem not such a good fit for CouchDB. You have to keep per-user tags, and you (a) don't want to keep them in the user document because you have to update the user document all the time, and (b) don't want to keep them in article documents because you'd have to update the article documents all the time.

I think my ideal solution would involve per-user databases; that would make the problem easily tractable. You have feed documents and article documents and you can just keep the user tags in the article documents. There's kind of a lot of duplication (because you have to store articles in every user database), but at least it's easy (and relatively fast) to query.

djc
  • 11,603
  • 5
  • 41
  • 54
  • I don't think updating a CouchDB doc a lot of times is necessarily a bad thing if you don't mind compacting the database and losing revision history. – Teddy May 21 '13 at 13:24
  • It does have the potential for document conflicts, which are just annoying to resolve. – djc May 21 '13 at 14:36
  • 1
    I think per-user databases is an overkill for this. 50 Articles per feed, and assume that on average a user subscribes to 10 feeds. That's 500 articles. If the articles are indexed/partitioned by feed_id, it is relatively inexpensive to query. – Subhas May 21 '13 at 21:33
  • Oh well, I guess my point is to per-user duplication of article documents. – djc May 22 '13 at 14:45
0

As you might have seen, NoSQL is all about compromises based on usage scenarios. At some point, you will have to write views and queries, and there is no one design that fits all.

In your scenario, you have said that each Feed will have only the latest 50 Articles, so the articles will get quickly irrelevant (and so does any data that is associated with them). So if you store the tags in the User model, you will have to update the user object three times: 1 when user tags an article, 2 when user removes a tag, and 3 when the article gets stale and deleted. 3 is inevitable.

Its better to store tags in the Article, so that they get deleted along with the Article.

  • Feeds

    • Name
    • Last Updated
  • Articles

    • FeedId
    • Title
    • Text
    • Tags: { "tag-1" : [ "user1", "user2", ... ], "tag2" : [ "user3", "user4", ... ] }
  • Users

    • id
    • Feeds: [feed1, feed2]

You can see that I'm storing tags grouped by user. You could also do the reverse { "user1" : "tag1", "user2" : "tag1", "user3" : "tag2", "user4" : "tag2", ... } if you believe that this helps processing (based on your filtering requirements).

Subhas
  • 14,290
  • 1
  • 29
  • 37