For Cassandra, how should the historical data be taken care of?

Question

Say I have a dozen CF/SCF and have the write traffic keeps coming (might have spikes). Over the time, a few of them will growing much faster (due to their own nature) than others and data table could be huge. At that stage, should they still be sitting on the same disk as the other CF/SCF? what if the disk is almost full due to the large amount of store data? or should we consider introducing additional CF/SCF for storing historical data?

In general, what's the best practices that we need follow to take care of the historical data?

score 2 · Accepted Answer · answered Jan 12 '12 at 12:17

2

The size of the CF isn't really the issue, as the keys are replicated and spread based on the # of nodes, the token selection per node, the partitioner selected and the replication strategy -- all configurable for a keyspace.

answered Jan 12 '12 at 12:17

sdolgy

6,963
3
41
61

2

Yeah, just make sure that you don't have individual *rows* growing without bound, and you should be good. (In our application, when we store time-series data related to an object, we typically use a row key of `objectkey-month-year` to limit how much an individual row grows; the column keys are TimeUUIDs.) – Jan 12 '12 at 18:54
actually, it's more like `objectkey-year-month` so that they sort properly. typed that wrong. :) – Jan 12 '12 at 19:42

For Cassandra, how should the historical data be taken care of?

1 Answers1