Cassandra SELECT on 2ndary index always sorted on partition key?

Question

Say I have the following table and secondary indices defined:

CREATE TABLE ItemUpdates (
    time         timestamp,
    item_name    text,
    item_context text,
    item_descr   text,
    tags         map<text, int>,
    PRIMARY KEY ((time, item_name, item_context))
);

CREATE INDEX ItemUpdateByName
    ON ItemUpdates(item_name);

CREATE INDEX ItemUpdateByContext
    ON ItemUpdates(item_context);

CREATE INDEX ItemUpdateByTag
    ON ItemUpdates(KEYS(tags));

General background information on the data model: an item has a unique name within a context, so (item_name, item_context) is a natural key for items. Tags have some value associated with them.

A natural query in my application is "show me all updates on item X with a certain tag". This translates to:

SELECT * FROM ItemUpdates
    WHERE item_name='x'
        AND item_context='a'
        AND tags CONTAINS KEY 't';

When I try some queries I notice that although the cluster uses the Murmur3Partitioner, the results come ordered by time. This makes sense when you consider that Cassandra stores secondary indices as wide rows, and that colums are ordered by their name.

(1) Does Cassandra always return rows sorted by partition key when selecting on a(n) (set of) indexed column(s)?

The reason I find this interesting is that other natural queries in my application include:

fetch all updates on item X, since date D
fetch the 300 most recent updates on item X

What surprises me is that adding a clause ORDER BY time DESC to my select statement on ItemUpdates results in an error message "ORDER BY with 2ndary indexes is not supported."

(2) (How) can I do a range query on the partition key when I narrow the query by selecting on an indexed column?

score 2 · Accepted Answer · answered Nov 04 '14 at 10:25

2

The only natural "auto" sorting that you should get on cassandra is for columns in a wide row. partitions when using murmur3 are not "sorted" as that would mess up the random distribution (afaik). Indexes are stored on each node in a "hidden" table as wide rows. When on filter on an index, it's hitting that "partition" "on the node" and the values are the rows in that partition (which correspond to matching rows on that node). Try your query using different data sets and different columns. Maybe the data you have cause the results to be sorted.

(2) As it stands, you can only do range queries on clustering keys, not on the partition key. In general, for efficient querying, you should attempt to hit one (or a few) partitions, and filter on indexes / filter on clustering keys / range query on the clustering key. If you attempt to not hit a partition, it becomes a clusterwide operation, which isn't usually great. If you are looking to do cluster wide analysis (ala map reduce style), take a look at Apache Spark. Spark cassandra integration is quite good and is getting better.

answered Nov 04 '14 at 10:25

ashic

6,367
5
33
54

Thanks. So the answer to (1) is "no". Checked your suggestion and indeed: the results didn't come back sorted. Appears to me I still don't fully understand the internals of 2ndary indices. I thought that an index is CF of wide rows with the indexed value as the wide row's partition key, and the partition keys of the 'target' rows as column names (with no value). Result would be that the results of an index lookup would be a sorted(!) list of partition keys (because columns are ordered by name). – Rinke Nov 04 '14 at 10:59
The first version of ItemUpdates had `((item_name, item_context), time)` as PK. The bulleted queries from my question would then be straightforward of course. But doing some calculations I learned that the rows would become too wide after some time, so I needed to partition differently. Any suggestions? – Rinke Nov 04 '14 at 11:08
You could introduce bucketing. For example, ((name, context, year), time). That's give you one wide row per year. If that's too much, you could do it by month+year: ((name, context, month_year), time). Might mean more queries from client side, but usually n*2ms queries is cheaper than a large map reduce. – ashic Nov 04 '14 at 11:12
Nice insight. So you're saying that the pattern for efficient querying is: first select on partition key and then filter further using clustering key or secondary indices, right? – Rinke Nov 04 '14 at 11:23
Most efficient is specifying partition + clustering key. The next most efficient is parition key + (clustering and / or index). Then next most efficient is a few partition keys (with IN statement). This depends a bit on wide row size. Not specifying a partition key is a cluster wide operation, and usually it's better to use Spark for those (analysis type) things. – ashic Nov 04 '14 at 11:28
With "clustering" you mean range queries on the CK? – Rinke Nov 04 '14 at 11:41
Clustering keys are the keys in the primary key that are not the partition key. So, ((a, b), c, d) pk will have c and d as clustering keys. Clustering keys allow range queries if all the "prior" keys are specified. This makes sense if you think about it as data is arranged inside a partition ordered by each susccessive clustering key. So, in the previous example, for each partition, data will be sorted by c, and within c data will be sorted by d. – ashic Nov 04 '14 at 12:27

Cassandra SELECT on 2ndary index always sorted on partition key?

1 Answers1