0

I have a problem with cassandra ( ScyllaDB ( indexes not supported ! ) )

in my scenario i have a table with three columns

CREATE TABLE test (a text , b text , c text , PRIMARY KEY ( a , b ,c ) );

Now i want to select data's by the second cluster key ( c ) but b is needed.

My goal but incorrect query is :

SELECT * FROM test WHERE c='...' ALLOW FILTERING

And correct query is this :

SELECT * FROM test WHERE b='...' AND c='...' ALLOW FILTERING

is there any solution for my problem to only use b cluster key in select query ?

KoLiBer
  • 194
  • 4
  • 15

3 Answers3

5

You can query on

  • a SELECT * FROM test WHERE a='...'
  • a and b SELECT * FROM test WHERE a='...' AND b='...'
  • a and b and c SELECT * FROM test WHERE a='...' AND b='...' AND c='...'

But not a and c. This is because you need the partition key + zero or more of the clustering keys in the order they are defined.

Cosmetic: In the CREATE TABLE the () around a are not needed, since you do not apply a composite partition key:

CREATE TABLE test (a text, b text, c text, PRIMARY KEY (a, b, c))
tbsalling
  • 4,477
  • 4
  • 30
  • 51
  • thanks tbsalling for tip :) - so isn't any solution for using three or more unique keys ? unfortunately in scylladb indexes are not supported yet – KoLiBer Aug 08 '17 at 11:34
1

As others have already pointed out Cassandra does not support filtering while skipping parts of the clustering key. And while it is tempting to consider this as a limitation, it is helpful to take a deeper look into why this restriction exists.

First of all, the ALLOW FILTERING clause already puts stress on all of the Cassandra nodes in a cluster. As the query does not specify the partition key, each of the nodes would have to process it by loading data from a disk and discarding records that don't match provided criteria. But as far as I understand, due to the way data is stored by Cassandra in files, it can load only its subset based on the clustering key provided in a query. However, only if either all of the components of the clustering key are specified, or only one or more of the last ones are omitted.

If a query "skips" parts of the clustering key, like in your example, each node would have to load pretty much everything from a file system and sequentially look for a match. You could imagine the consequences, even if the actual number of records matched by the filter is negligible.

This post explains in more details the impact of ALLOW FILTERING while this one dives deeper into SQL WHERE clause in general.

Possible Solution

I'm sure that knowing about this restriction does not solve your problem of being able to query by c component of the partition key. As far as I could tell, revising of the data model would usually provide a better solution.

If you find yourself looking for data by c often, add one more table, where c would become a partition key. Not only you'll get all of the benefits of caching and limited data loading, but also you would limit your query to only one node. The improvements in the execution time often overweight any savings in the disk space that you might get from trying to tailor a filtering query.

oiavorskyi
  • 2,893
  • 1
  • 20
  • 23
0

In general, Scylla aims to achieve feature parity with Cassandra. To that end, Scylla's limitations on cluster key filtering is the same as Cassandra's (other comments in this thread apply). Scylla 2.0 RC1 will be released shortly bringing an experimental version of materialized views. You can read about what will and will not be supported in the 2.0 RC1 release here: http://www.scylladb.com/2017/07/27/materialized-views-preview-scylla-2-0/ .

siculars
  • 62
  • 3