I'm doing some performance tests with different designs in cassandra recently.
I'm currently using cassandra as write-intensive project. However, I'm going to add read-intensive part to export data with use of select
statement.
I'm doing time series with the following table;
CREATE TABLE events (
date text,
n int, // it could be 1,2,3
id timeuuid,
PRIMARY KEY ((date, n), id)
);
I have date|n
as the partition key. To improve read performance, I'm trying to gain leverage from the concept of wide rows. In the documentation, it is stated that;
If the partition key's are same, they're inserted to the same physical node with widening the partition key's row.
Therefore, I use n
to evenly distribute the rows in case of getting hotspots which is stated here;
However, in a multi-node cluster, when I insert the following;
'2013-07-30'|1, some-timeuuid
'2013-07-30'|1, another-timeuuid
I see that they're not in the same physical node.
I get the node info by;
nodetool getendpoint keyspace columnfamily some-timeuuid
So, somehow I want them to be in same row to improve read performance, but not too wide in case of getting 2 billion columns. (which is the size limit of columns)
So, any ideas what's going on here ?