In terms of Cloudant, the described use case requires a bit of care:
- Fast writes
- Ever-growing data set
- Potentially global queries with aggregations
The key here is to use an immutable data model--don't update any existing documents, only create new ones. This means that you won't have to suffer update conflicts as the load increases.
So a post is its own document in one database, and the likes stored separately. For likes, you have a few options. The classic CouchDB solution would be to have a separate database with "likes" documents containing the post id of the post they refer to, with a view emitting the post id, aggregated by the built-in _count
. This would be a pretty efficient solution in this case, but yes, indexes do occupy space on Couch-like databases (just like as with any other database).
Second option would be to exploit the _id
field, as this is an index you get for free. If you prefix the like-documents' ids with the liked post's id, you can do an _all_docs
query with a start and end key to get all the likes for that post.
Third - recent CouchDBs and Cloudant has the concept of partitioned databases, which very loosely speaking can be viewed as a formalised version of option two above, where you nominate a partition key which is used to ensure a degree of storage locality behind the scenes -- all documents within the same partition are stored in the same shard. This means that it's faster to retrieve -- and on Cloudant, also cheaper. In your case you'd create a partitioned "likes" database with the partition key being the post-id. Glynn Bird wrote up a great intro to partitioned DBs here.
Your remaining issue is that of ever-growth. At Cloudant, we'd expect to get to know you well once your data volume goes beyond single digit TBs. If you'd expect to reach this kind of volume, it's worth tackling that up-front. Any of the likes schemes above could most likely be time-boxed and aggregated once a quarter/month/week or whatever suits your model.