1

The schema I am using is following :

CREATE TABLE mytable(
id varchar,
date date,
name varchar,
PRIMARY KEY ((date),name, id)
) WITH CLUSTERING ORDER BY (name desc);

I have 2 queries for my use case :

  1. Fetching all records for given name
  2. Delete all records for given date.

As we can't delete records without partition key being specified, my partition key got fixed to date only and no other column can be added to partition key as I won't have anything except date at time of deletion.

But to fetch records using name, I need to use ALLOW FILTERING as I need to scan whole table of above schema which causes performance issue.

Can you suggest a better way so that I can skip ALLOW FILTERING with is also delete by date compatible.

cur10us
  • 141
  • 12
  • You won't be able to do this with a single table. You either need to use ALLOW FILTERING and suffer poor performance, and non-scalability, or you need an additional table. – Christophe Schmitz Mar 27 '18 at 08:46
  • If I introduce another table, how will I be able to remove the records of deleted date from results of fetching query. Can you shed some light on the additional table schema? – cur10us Mar 27 '18 at 08:54

2 Answers2

1

You could use indexes: https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSecondaryIndex.html

But you have to be careful, there could be issues depending on the size of the table. You should read this post for more informations: https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes

Thomas Arnaud
  • 448
  • 5
  • 16
1

You need an additional table to support your requirements. Your main query is to retrieve the records given a name. For this, you should use mytable as follow (note the primary key):

CREATE TABLE mytable(
id varchar,
date date,
name varchar,
PRIMARY KEY ((name),date, id)
) WITH CLUSTERING ORDER BY (date desc);

This table will let you retrieve your data for a given name with (query 1):

 SELECT * FROM mytable WHERE name='bob';

Now, you want to be able to delete by date. For this you would need the following additional table:

CREATE TABLE mytable_by_date(
id varchar,
date date,
name varchar,
PRIMARY KEY ((date), name, id)
) WITH CLUSTERING ORDER BY (name);

This table will let you find the name (and id) for a given date with:

SELECT * from mytable_by_date WHERE date='your-date';

I don't know your business requirements, so you this query might return 0, 1 or maybe more results. Once you have that, you can issue the delete against the first and second table (maybe using a logged batch for atomicity?)

DELETE * from mytable_by_date WHERE date='your-date' and name='the-name' and id='the-id'
DELETE * from mytable WHERE name='the-name' and ...

Overall, you might need to adjust according to your business requirements (is name unique, is uniqueness enforced by id etc...)

Hope it helps!

Christophe Schmitz
  • 2,896
  • 2
  • 14
  • 21
  • It is not clear how to get `the-name` and `the-id` as I am having only `your-date` while deleting. – cur10us Mar 27 '18 at 09:55
  • You get it from the query `SELECT * from mytable_by_date WHERE date='your-date';` – Christophe Schmitz Mar 27 '18 at 09:57
  • Can you please write valid syntax for "getting" one query's values and using them in another query as it is still not clear to me. – cur10us Mar 27 '18 at 11:33
  • Unlike SQL / traditional RDBMS where you could write this as a single query, CQL is pretty limited. This means that you need to write multiple query, and your application needs to do the job. In this case, you need to use the `SELECT * from mytabpe_by_date WHERE date=...` to retrieve the name(s) and id(s). Then, using the results, your application needs to issue the DELETE statements. – Christophe Schmitz Mar 27 '18 at 20:05
  • Do we need to use driver to do that like this : https://stackoverflow.com/questions/30373893/is-it-possible-to-use-variables-in-cql-commands-in-cql-scripts or do we have another way to use name and id obtained by 1st query and use it in another one. If possible please describe with proper syntax. – cur10us Mar 28 '18 at 02:09
  • Yes, you should use the driver, and probably use prepared statement. Have a look at this documentation: https://docs.datastax.com/en/developer/java-driver/3.4/manual/statements/prepared/ – Christophe Schmitz Mar 28 '18 at 05:16