-2

How do one implement live like and dislike [or say views count] in couchdb/couchbase in the most efficient way. Yeah one can use reduce to calculate count each time and on front end only use increment and decrement to one API call to get results.

But for every post there will be say millions of views, like and dislikes.
enter image description here
If we will have millions of such post [in a social networking site], the index will be simply too big.

Anurag Vohra
  • 1,781
  • 12
  • 28

1 Answers1

2

In terms of Cloudant, the described use case requires a bit of care:

  1. Fast writes
  2. Ever-growing data set
  3. Potentially global queries with aggregations

The key here is to use an immutable data model--don't update any existing documents, only create new ones. This means that you won't have to suffer update conflicts as the load increases.

So a post is its own document in one database, and the likes stored separately. For likes, you have a few options. The classic CouchDB solution would be to have a separate database with "likes" documents containing the post id of the post they refer to, with a view emitting the post id, aggregated by the built-in _count. This would be a pretty efficient solution in this case, but yes, indexes do occupy space on Couch-like databases (just like as with any other database).

Second option would be to exploit the _id field, as this is an index you get for free. If you prefix the like-documents' ids with the liked post's id, you can do an _all_docs query with a start and end key to get all the likes for that post.

Third - recent CouchDBs and Cloudant has the concept of partitioned databases, which very loosely speaking can be viewed as a formalised version of option two above, where you nominate a partition key which is used to ensure a degree of storage locality behind the scenes -- all documents within the same partition are stored in the same shard. This means that it's faster to retrieve -- and on Cloudant, also cheaper. In your case you'd create a partitioned "likes" database with the partition key being the post-id. Glynn Bird wrote up a great intro to partitioned DBs here.

Your remaining issue is that of ever-growth. At Cloudant, we'd expect to get to know you well once your data volume goes beyond single digit TBs. If you'd expect to reach this kind of volume, it's worth tackling that up-front. Any of the likes schemes above could most likely be time-boxed and aggregated once a quarter/month/week or whatever suits your model.

xpqz
  • 3,617
  • 10
  • 16