1

Use case: Find maximum counter value in a specific id range

I want to create a table with these columns: time_epoch int, t_counter counter

The frequent query is:

select time_epoch, MAX t_counter where time_epoch >= ... and time_epoch < ...

This is to find the counter in specific time range. Planning to make time_epoch as primary key. I am not able to query the data. It is always asking for ALLOW FILTERING. Since its a very costly function, We dont want to use it.

How to design the table and query for the use case.

Aaron
  • 55,518
  • 11
  • 116
  • 132

2 Answers2

2

Let's assume that we can "bucket" (partition) your data by day, assuming that enough write won't happen in a day to make the partitions too large. Then, we can cluster by time_epoch in DESCending order. With time based data, storing data in descending order often makes the most sense (as business reqs usually care more about the most-recent data).

Therefore, I'd build a table like this:

CREATE TABLE event_counter (
    day bigint,
    time_epoch timestamp,
    t_counter counter,
    PRIMARY KEY(day,time_epoch))
WITH CLUSTERING ORDER BY (time_epoch DESC);

After inserting a few rows, the clustering order becomes evident:

> SELECT * FROM event_counter ;
    WHERE day=20210219 
      AND time_epoch>='2021-02-18 18:00'
      AND time_epoch<'2021-02-19 8:00';

 day      | time_epoch                      | t_counter
----------+---------------------------------+-----------
 20210219 | 2021-02-19 14:09:21.625000+0000 |         1
 20210219 | 2021-02-19 14:08:32.913000+0000 |         2
 20210219 | 2021-02-19 14:08:28.985000+0000 |         1
 20210219 | 2021-02-19 14:08:05.389000+0000 |         1

(4 rows)

Now SELECTing the MAX t_counter in that range should work:

> SELECT day,max(t_counter) as max
FROM event_counter
WHERE day=20210219
  AND time_epoch>='2021-02-18 18:00'
  AND time_epoch<'2021-02-19 09:00';

 day      | max
----------+-----
 20210219 |   2
Aaron
  • 55,518
  • 11
  • 116
  • 132
0

Unfortunately there is no better way. Think about it.

If you know cassandra architecture then you would know that your data is spread across multiple nodes based on primary key. only way to filter on values from primary key would be to transverse each node which is essentially what "ALLOW FILTERING" is done.

indolentdeveloper
  • 1,253
  • 1
  • 11
  • 15
  • 1
    Thanks for the reply. But what about adding a partition key of some sort ? and making time_epoch as private key with clustering order. – Yakshup Goyal Feb 19 '21 at 07:07