1

I'm looking for a good way to store comments in cassandra. Can someone provide me a good Cassandra data model to store comments?

The data model should allow me to retrieve a certain number of these comments with phpcassa.

This was my idea:

Comments = {
    CommentId1:{
        CommentAuthor,
        Content,
        Timestamp
    },
    CommentId2:{
        CommentAuthor,
        Content,
        Timestamp
    }
    ...
}

CommentsLine = {
    EntryId1:{
        CommentId1: timestamp,
        CommentId2: timestamp,
        CommentId3: timestamp,
        ...
    }
    ...
}

But i'm not sure this is the better way. Thanks for helping.

siannone
  • 6,617
  • 15
  • 59
  • 89

2 Answers2

2

Your solution seems to be cool but may be it's better to store in the CF CommentsLine the key with a timestamp like that you can order your comments.

CommentsLine = {
    EntryId1:{
        timestamp: CommentId1 ,
        timestamp: CommentId2 ,
        timestamp: CommentId3 ,
        ...
    }
    ...
}
sahid
  • 2,570
  • 19
  • 25
  • In my solution if you want sorted comments You can get it from posts table by a simple slice over column storing {Posted Timestamp} -> {{Comment author user ID} and other meta data as JSON} – MaX Jul 12 '12 at 15:41
  • Yes perhaps. I don't know you schema but in this example it's better to use columns in a rowkey to keep sorted your comments. – sahid Jul 12 '12 at 15:50
  • @sahid Ok, but how do i get only the first 5 results then another 5 results ecc...? – siannone Jul 13 '12 at 13:49
  • The API provides a solution to add a limit of the number of columns you want to retrieve for a specific rowkey, but it's right the API doesn't support an offset for columns. – sahid Jul 13 '12 at 14:00
  • @sahid Thank you, just one more curiosity. Should i make requests via CQL using `execute_cql_query()` or via `get()`, `insert()` functions etc...? – siannone Jul 13 '12 at 14:10
  • That depend of you, I don't think there are problems to use this or that. In my personal pow, I don't like using CQL because the most of the time we forgot to think in no relational database. – sahid Jul 13 '12 at 14:53
1

I've solved this problem once. Here is How i solved this problem. A separate ColumnFamily comments where each row is an ID of comment (my ID were of the form {Parent Post ID}-{Comment author user ID}-{Comment Posted Timestamp}) and then have a columns within containing Author, comment, posted date etc.

You can have your own format and separating them out in an individual column family makes sure they are distributed.

Once done link you have to link each comment to its parent you can do so by keeping a column name comments that can have {Posted Timestamp} -> {{Comment author user ID} and other meta data as JSON} (This was totally specific to my scenario you should think something you like).

When somebody comments you can generate is posted micro time from PHP and accordingly insert into cassandra. This format ensures that Cassandra distributes comments in rings and the row in post remains minimal only containing required information.

Loading comments back requires you selecting columns and then do a MultiGet call on Comments column family.

MaX
  • 1,334
  • 13
  • 26