3

Lets say there is a table with 3 columns A,B, and C. A is primary key. I have 2 types of query, one that searches by A and B and another that searches by A and C. Is it better to add a secondary index for C to search based on A and C or make a new table with A, C, and B columns.

To put it in different perspective, in general it is a bad idea to have two secondary indexes on two columns and have a where clause conditioning on both indexes. Is it the same case for combining primary key and a secondary index?

https://www.youtube.com/watch?v=CbeRmb8fI9s#t=56

https://www.youtube.com/watch?v=N6UY1y3dgAk#t=30

Navnav
  • 982
  • 3
  • 11
  • 25

1 Answers1

5

Secondary indexes almost never aid performance, they are mostly a tool of convince for allowing queries to explore your data. Almost all performance gains come from properly structuring your primary key and creating data schemas which properly model the queries you want to perform.

So having two tables A by B and A by C would most likely be the ideal solution and will actually scale with your data.

RussS
  • 16,476
  • 1
  • 34
  • 62
  • You are right Russ, this video https://www.youtube.com/watch?v=N6UY1y3dgAk#t=30 made me think about it differently, but there is no explanation for it. – Navnav Feb 08 '14 at 06:37
  • For a few reasons, but the biggest is that secondary indexes end up being distributed throughout your cluster. Because of this you end up requiring data from every node when performing secondary index queries. Here is a site with a good runthrough http://www.wentnet.com/blog/?p=77 – RussS Feb 09 '14 at 18:36
  • That's right but the video mentions that mixing partition key and index can be beneficial in some cases, I'd like to know what are those cases and why/how it can be beneficial. – Navnav Feb 10 '14 at 19:33
  • When you have a small enough cluster that requesting data from every node to complete a query isn't a holdup. Basically just when you are starting out or if you are running a query that is hardly ever actually performed. Bottom line is, if you are relying on Cassandra for horizontal scaling, don't rely on secondary indexes. – RussS Feb 10 '14 at 20:54