How scalable are automatic secondary indexes in Cassandra 0.7?

Question

As far as I understand automatic secondary indexes are generated for node local data.

In this case query by secondary index involve all nodes storing part of column family to get results (?) so (if i am right) if data is spread across 50 nodes then 50 nodes are involved in single query?

How far can this scale? Is this more scalable than manual secondary indexes (inverted index column family)? Few nodes or hundred nodes?

score 4 · Accepted Answer · answered Feb 23 '11 at 10:31

4

See Stu's answer from the ml http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html

answered Feb 23 '11 at 10:31

Schildmeijer

20,702
12
62
79

I've just typed/copy my question on this mailing list yesterday ;) – Feb 23 '11 at 15:10

score 1 · Answer 2 · answered Sep 06 '11 at 09:52

Yes, if you need to fetch all indexed rows, then the index queries involve all nodes. But this is actually more efficient, than building your own index! Details here.

However, if you lookup only a few rows, and each index entry maps to very many rows, then it's likely that the very first node is able to answer your question. Your query will then involve only one node. From the Apache mailing list:

The first node can answer the question as long as you've requested less rows than the first node has on it. Hence the "low cardinality" point in what you quoted.

(by Jonathan Ellis, here.)

(I also posted a question on the mailing list, a follow up question to your question, inquisitor, because I didn't really understand the answer to your question (linked in Schildmeijer's answer).)

How scalable are automatic secondary indexes in Cassandra 0.7?

2 Answers2

Linked