0

i want to make a query for two column families at once... I'm using the cassandra-cql gem for rails and my column families are:

users
following
followers
user_count
message_count
messages

Now i want to get all messages from the people a user is following. Is there a kind of multiget with cassandra-cql or is there any other possibility by changing the datamodel to get this kind of data?

user934801
  • 1,119
  • 1
  • 12
  • 30

1 Answers1

2

I would call your current data model a traditional entity/relational design. This would make sense to use with an SQL database. When you have a relational database you rely on joins to build your views that span multiple entities.

Cassandra does not have any ability to perform joins. So instead of modeling your data based on your entities and relations, you should model it based on how you intend to query it. For your example of 'all messages from the people a user is following' you might have a column family where the rowkey is the userid and the columns are all the messages from the people that user follows (where the column name is a timestamp+userid and the value is the message):

RowKey                              Columns
-------------------------------------------------------------------
|        | TimeStamp0:UserA | TimeStamp1:UserB | TimeStamp2:UserA |
| UserID |------------------|------------------|------------------|
|        | Message          | Message          | Message          |
-------------------------------------------------------------------

You would probably also want a column family with all the messages a specific user has written (I'm assuming that the message is broadcast to all users instead of being addressed to one particular user):

RowKey                   Columns
--------------------------------------------------------
|        | TimeStamp0 | TimeStamp1 | TimeStamp2        |
| UserID |------------|------------|-------------------|
|        | Message    | Message    | Message           |
--------------------------------------------------------

Now when you create a new message you will need to insert it multiple places. But when you need to list all messages from people a user is following you only need to fetch from one row (which is fast).

Obviously if you support updating or deleting messages you will need to do that everywhere that there is a copy of the message. You will also need to consider what should happen when a user follows or unfollows someone. There are multiple solutions to this problem and your solution will depend on how you want your application to behave.

psanford
  • 5,580
  • 1
  • 26
  • 25
  • Thanks for your answer, i'll try that, but how can i make this kind of key (TimeStamp:User) in cassandra-cql? or do you thought about some ruby string operation for tying up that keyname? I there a kind of composite key to use with cql? – user934801 Jan 16 '12 at 16:20
  • 2
    Cassandra has support for composite column names, but it is not yet exposed via CQL ([currently targeted for 1.1 release](https://issues.apache.org/jira/browse/CASSANDRA-2474)). So you could use the thrift interface or you could just use timestamps for the column names and then serialize user_id+message in the body using your favorite format (json, protocolbuffers, etc.). – psanford Jan 16 '12 at 16:27