how do multiple Cassandra secondary indices work?

Question

As Cassandra does not have an execution plan, we were wondering how multiple secondary indices would work? i.e., if query was filtered by a different column order, which secondary index would get the preference and why?

We do know they are a bad practice and should be used for low cardinality sets or many duplicates but we were trying to leverage existing legacy cassandra tables and cannot use both cassandra secondary indices and SOLR indices at the same time, so don't have an option here.

Not much is discussed here either: http://www.datastax.com/docs/1.1/ddl/indexes

I would also like to know "how multiple indices are actually represented" because each index only works as a map to multiple row ids for the data within the same node and "how data is retrieved", does it aggregate multiple sets of data from each node (but which index column is used first) and then add all rows from all nodes together? — kisna, Aug 05 '14 at 02:38

score 0 · Answer 1 · answered Aug 05 '14 at 04:12

Secondary indexes are like lookup tables you create yourself, that cassandra manages. A node stores index info for rows it contains. Updates to an index on a node and the update of the data on that node is atomic. If multiple indexes are used in your query, only one will actually be used. I hope somebody can correct me on this, but from what I can tell, the first filter in your predicate is the one that'll be used.

Don't think of indexes as global lookups (in the general case). This will lead to annoying performance problems, etc. Think of indexes as a way of quickly getting to some columns inside of a partition where the column you want an equality filter on isn't the clustering key (or you want to be able to filter on the second clustering key without specifying the first one). If you hit a partition, then index performance is usually not bad. The information about low cardinality is correct - the higher the cardinality, the worse your index will perform.

Here's a short faq on indexes: http://wiki.apache.org/cassandra/SecondaryIndexes

I know that the primary partition keys and their data can represented as a SortedHashMap with a O(1) lookup from the partition given a token value, wonder what would the structure of Secondary index be? If it is true that the first index column is used, then should we choose the one with most cardinality (very few values) or least cardinality (many unique values)? Also, where is the documentation about using first index column when multiple secondary indices are specified? And how are the lookups made to match the values for all secondary indices. — kisna, Aug 05 '14 at 05:45

how do multiple Cassandra secondary indices work?

1 Answers1