5

We have been using Cassandra 0.6 and now have Column Families with millions of keys. We are interested in using the new Secondary Index feature available in the 0.7 but couldn't find any documentation on how the new index is stored.

Is there any disk-space limitation or is the index stored similar to keys in that it's spread over multiple nodes?

I've tried combing through the Cassandra site for an answer but to no avail.

Templar
  • 1,843
  • 7
  • 29
  • 42
user574793
  • 51
  • 2

1 Answers1

6

Secondary indexes are stored as Column Families that are not accessible by the user. Their size will roughly be:

(cardinality of the set of indexed values * the avg size of the index values) + (the number of keys in the indexed column family * the avg size of keys in the column family).

Nodes only index rows that are stored locally -- that is, only rows for which they are a replica.

Tyler Hobbs
  • 6,872
  • 24
  • 31
  • Hello Tyler Hobbs, this is an very interesting and informative posting. You talk about "Indexes are stored as CFs": Does this mean ALL indexes are stored under ONE new CF or does this mean EVERY index is stored as its OWN CF (with a single row). Thanks!! – Markus Jun 23 '11 at 01:32
  • 1
    Every index is stored as its own CF. – Tyler Hobbs Jun 27 '11 at 19:36
  • is it stored into data directory? i mean can i see the size diffrence ? – samarth Nov 02 '11 at 06:23
  • Yes, you can see the SSTable files for the index in the data directory. If you've supplied a 'name' for the index, that name should be in the SSTable filename. – Tyler Hobbs Nov 02 '11 at 15:26