0

I am a little confused by clustering in Cassandra. I have an application that is very write-heavy and update-heavy. With a traditional relational database, I'd partition data into two tables: one table for data that changes infrequently; and one table (with shorter rows) for the columns that change frequently:

For example:

create table user_def ( id int primary key, email list< varchar > ); # stable
create table user_var ( id int primary key, state int ); # changes all the time

But Cassandra seems to be optimized for accessing sparsely-populated columns, so I'm not sure there is any advantage in mimicking this approach for Cassandra schemas.

With Cassandra, is there any advantage in separating frequently-updated columns to a separate table/column-family (away from infrequently-updated columns) or should I combine all the columns together into one table/column-family? Do circumstances change if I have a compound primary key and clustering comes into play?

Mayur Patel
  • 945
  • 1
  • 7
  • 15

2 Answers2

0

Cassandra treats primary keys like this:

The first key in the primary key (which can be a composite) is used to partition your data. This defines which node(s) your data is saved in (and replicated to). Other fields in the primary key is then used to sort entries within a partition. The whole partition is always going to be in one node (and replica nodes) in its entirety. Moreover, each entry within a node is sorted by the "other" fields in the primary key. [The first element of the primary key is called the partition key, while the other fields in the primary key are called clustering keys.]

Based on that, I'd say you might as well simply have a table with id, state and email. It looks like you're using skinny rows, and I don't think you'd gain anything (if any) of creating the separate tables.

ashic
  • 6,367
  • 5
  • 33
  • 54
0

I had approved ashic's answer until I came upon this: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

which states (for delete-heavy access):

...consider partitioning data with heavy churn rate into separate rows and deleting the entire rows when you no longer need them. Alternatively, partition it into separate tables and truncate them when they aren’t needed anymore...

This falls under the 'queue' anti-pattern for the product.

Mayur Patel
  • 945
  • 1
  • 7
  • 15