cassandra composite, compund keys on multi node

Question

I'm doing some performance tests with different designs in cassandra recently. I'm currently using cassandra as write-intensive project. However, I'm going to add read-intensive part to export data with use of select statement.

I'm doing time series with the following table;

CREATE TABLE events (
  date text,
  n int, // it could be 1,2,3
  id timeuuid,
  PRIMARY KEY ((date, n), id)
);

I have date|n as the partition key. To improve read performance, I'm trying to gain leverage from the concept of wide rows. In the documentation, it is stated that;

If the partition key's are same, they're inserted to the same physical node with widening the partition key's row.

Therefore, I use n to evenly distribute the rows in case of getting hotspots which is stated here;

However, in a multi-node cluster, when I insert the following;

'2013-07-30'|1, some-timeuuid 
'2013-07-30'|1, another-timeuuid

I see that they're not in the same physical node.

I get the node info by;

nodetool getendpoint keyspace columnfamily some-timeuuid

So, somehow I want them to be in same row to improve read performance, but not too wide in case of getting 2 billion columns. (which is the size limit of columns)

So, any ideas what's going on here ?

John · Accepted Answer · 2013-07-30T12:33:36.380

7

If am not mistaken, in order to find out in which nodes your rows are stored, you would need to run:

nodetool getendpoints keyspace columnfamily 2013-07-30:1

Use your (composite partition/) row key instead of your column key.

If you are using the SimpleStrategy for token/replica calculation, this will happen internally:

You compute the MD5 hash of the key. Create sorted list of tokens assigned to the nodes in the ring. Find the first token greater than the hash. This is the first node. Next in the list is the replica, which depends on the RF.

(found this on the cassandra mailing list http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-find-what-node-a-key-is-on-td6202253.html)

(Updated my answer according the comment.)

edited Jul 30 '13 at 12:33

answered Jul 30 '13 at 12:26

John

1,462
10
17

3

The composite separator is :, and you shouldn't put quotes round the string since they'll be included in the key. So the first example should be `nodetool getendpoints keyspace columnfamily 2013-07-30:1`. – Richard Jul 30 '13 at 12:32
1

Apparently, `nodetool getendpoints` also gives output for non-existing keys. It outputs the physical node ip after running an algorithm on key. you're right about the usage of `getendpoints` – aacanakin Jul 30 '13 at 12:50

cassandra composite, compund keys on multi node

1 Answers1