At Cassandra, I do not know how to do ORDER BY

Question

I prepared the following table "keyspaceB.memobox"

DROP TABLE IF EXISTS keyspaceB.memobox;
CREATE TABLE IF NOT EXISTS keyspaceB.memobox (
    pkey1 text,
    pkey2 text,
    id timeuuid,
    name text,
    memo text,
    date timestamp,
    PRIMARY KEY ((pkey1, pkey2),id,name)
) WITH CLUSTERING ORDER BY (id DESC,name DESC);

And I registered the following data.

INSERT INTO memobox (pkey1,pkey2,id,name,memo,date) VALUES ('a','b',now(),'tanaka','greet message1','2016-12-13');
INSERT INTO memobox (pkey1,pkey2,id,name,memo,date) VALUES ('a','b',now(),'yamamoto','greet message2','2016-12-13');

The following will succeed

SELECT * FROM memobox where pkey1='a' and pkey2='b' ORDER BY id;

However, the following will fail. I would like to ask your professor what is wrong.

SELECT * FROM memobox where pkey1='a' and pkey2='b' ORDER BY name;

■error

cqlsh:keyspaceb> SELECT * FROM memobox where pkey1='a' and pkey2='b' ORDER BY name;
InvalidRequest: code=2200 [Invalid query] message="Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY"
cqlsh:keyspaceb>

score 4 · Accepted Answer · edited May 23 '17 at 10:30

There are two different types of keys in cassandra, partition key and clustering key. The partition key determines which node the data gets stored, while the clusterning key determines the order in which the data gets stored in that parition(node).

In your case the partition key is pkey1 and pkey2. and the clustering key is id and name.

so the data in a partition will be stored based on the id and then name.

e.g if we have the following data

id |name 
1  | abc
1  | xyz
2  | aaa

In this case the row with id 1 is stored first, also if two rows have the same id then the order is decided by name column.

So when you query the data like this

SELECT * FROM memobox where pkey1='a' and pkey2='b' ORDER BY id;

cassandra finds the partitoin using pkey1 and pkey2 (aka the partition key) and then just return the data how it is stored on the disk.

However in the second case

SELECT * FROM memobox where pkey1='a' and pkey2='b' ORDER BY name;

since the data is not ordered by name alone,( it is first ordered by id and then by name). cassandra can not just blindly return the results , it has to do a lot more in order to correctly sort the results. Hence due to performance reasons this is not allowed.

That is why in the order by clause you have to specify the clustering columns in the order in which you specify them while creating the table(id and then name).

This is from another answer by @aaron Where and Order By Clauses in Cassandra CQL

Cassandra achieves performance by using the clustering keys to sort your data on-disk, thereby only returning ordered rows in a single read (no random reads). This is why you must take a query-based modeling approach (often duplicating your data into multiple query tables) with Cassandra. Know your queries ahead of time, and build your tables to serve them.

Thanks for your detailed answer !! I appreciate. It was very helpful. — Hidetoshi Maekawa, Dec 20 '16 at 04:52

At Cassandra, I do not know how to do ORDER BY

1 Answers1