Cassandra - WHERE clause with non primary key disadvantages

Question

I am new to cassandra and I am using it for analytics tasks (good indexing needed ).

I read in this post (and others): cassandra, select via a non primary key that I can't query my DB with a non-primary key columns with WHERE clause.

To do so, it seems that there is 3 possibilities (ALL with major disadvantages):

Create a secondary index (not recommended for performance issues).
Create a new table (I don't want redundant data even if it's ok with cassandra).
Put the column I want to query by within the primary key and in this case I need to define all the parts of the primary key in my WHERE clause and I can't uses other operator than IN or =.

Is there an other way to to what I am trying to do (WHERE clause with non-primary key column) without having the 3 constraints above?

Cassandra really isn't a good fit for the use case that you are describing. It sounds like you need query flexibility, and you simply will not get that out of Cassandra. The bottom line, is that the recommendation to create query tables (with redundant data) ***is a scalable solution***; whereas trying to use Cassandra like a relational database is not. — Aaron, Feb 21 '16 at 17:50
Hi @Aaron oups, the problem is that for query flexibility `mongodb` is recommended more than `cassandra` but for `read/write` performance (which is highly important in my case) and the latter is very bad in this point. — farhawa, Feb 21 '16 at 19:00
And the only way you will ever see that performance, is to take a query-based modeling approach using redundant data. Cassandra performs pretty terribly when you try to use a relational model or similar methods to achieve query flexibility. — Aaron, Feb 21 '16 at 19:27
I would suggest watching this course from datastax on data modeling, this along with the Core Concepts course provides a pretty solid foundation: https://academy.datastax.com/courses/ds220-data-modeling — bechbd, Feb 21 '16 at 19:44

score 7 · Accepted Answer · answered Feb 20 '16 at 18:45

7

From within Cassandra itself you are limited to the options that you have specified above. If you want to know why take a look here:

A Deep Look to the CQL Where Clause

However if you are trying to run analytics on information stored within Cassandra then have you looked at using Spark. Spark is built for large scale data processing on distributed systems. In fact if you are looking at using Datastax (see here) which has some nice integration features between Spark and Cassandra specifically for loading and saving data. It has both a free (Community) and paid (Enterprise) editions.

answered Feb 20 '16 at 18:45

bechbd

6,206
3
28
47

Hi @bechbd thank you for you respons . I am have an indexing problem here, how can spark be able to load data without the constraints I have mentioned abov ? – farhawa Feb 20 '16 at 20:07
1

You will have to load the data into Spark RDD using the limitations that are in the link I had above. Once in Spark you can the then use filters, map/reduce, range to filter the large amount of data into what you are looking for. The short answer to your indexing question is that what you are trying to do violates one of the fundamental ways that Cassandra is architected. AFAIK there is no way in Cassandra 2.X that you can get around these limitations If you are using Cassandra 3.X you can look at using a materialized view however those introduce their own complications. – bechbd Feb 21 '16 at 00:09

score 1 · Answer 2 · answered Sep 24 '19 at 12:04

1

Please, try to use IF in your query:

UPDATE [keyspace_name.] table_name
[USING TTL time_value | USING TIMESTAMP timestamp_value]
SET assignment [, assignment] . . . 
WHERE row_specification
[IF EXISTS | IF condition [AND condition] . . .] ;

see https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlUpdate.html

answered Sep 24 '19 at 12:04

Matthew I.

1,793
2
10
21

`IF` doesn't have any relation to what author of question is asking for... – Alex Ott Sep 24 '19 at 12:29
This can work, provide all required primary key in where clause and for any non-primary column you can provide it in the: IF condition clause. Caution: Your Cassandra table definition should be able to specify a unique row using primary key combinations in where clause. If you need to provide a non-primary column in where clause then the whole table definition should be rethought since in Cassandra the tables must be described as per query requirements – Pankaj Gupta Jun 25 '20 at 09:58

score 0 · Answer 3 · answered Feb 21 '16 at 21:40

I assume that the table is designed for a different purpose given that the fields you want to query by are not part of the partitioning key. My suggestion would be to duplicate the table and key it by the fields you want to query it by. I would recommend designing a new table for the exact purpose you will use it for as per Data modeling concepts.

Cassandra offers several advantages such as linear scaling etc by imposing certain restrictions with respect to what you can do with CQL.

score 0 · Answer 4 · answered Jun 11 '18 at 07:07

0

I had a similar issue while using cassandra 2.x version, upgrade your version to cassandra 3.0 and above. This was the only solution for me.

answered Jun 11 '18 at 07:07

coder

906
1
12
19

Cassandra - WHERE clause with non primary key disadvantages

4 Answers4

Linked