Secondary indexes should be used only in specific cases, like, when you use them together with condition on partition key column, you have correct cardinality for data, etc.
For example, if we have following table:
create table test.test (
pk int,
c1 int,
val1 int,
val2 int,
primary key(pk, c1));
and you created a secondary index on the column val2
, then following query will be very effective:
select * from test.test where pk = 123 and val2 = 10
because you restricted the execution of query only to the nodes that are replicas for pk
with value 123
.
But if you do
select * from test.test where val2 = 10
then Cassandra will need to go to the every node, and ask for data there - it will be much slower, and put a pressure to coordinating node.
Standard secondary indexes have other limitations, such as, search only for specific values, problems when column has very low or very high cardinality, etc. SASI indexes are better from design standpoint, although they are still experimental, and have problems with implementation.
You can find technical details about implementation of secondary indexes in the following blog post.
DataStax has other implementations in the commercial offering:
- DSE Search that is based on the Apache Solr, so you get a lot of flexibility (full text search, range queries, etc.)
- new implementation called SSTable Attached Indexes (SAI) - they are currently marked as beta, but they provide more flexibility than standard secondary indexes, with less overhead than DSE Search