1

Table schema goes like this:

 image_id | activity_time | image_location
----------+---------------+--------------------
      243 |    2021-10-22 | remotelocation_243
      243 |    2021-10-25 | remotelocation_243
       88 |    2021-10-12 |  remotelocation_88
      215 |    2021-10-20 | remotelocation_215
      215 |    2021-10-21 | remotelocation_215
      215 |    2021-10-22 | remotelocation_215
      215 |    2021-10-25 | remotelocation_215
       80 |    2021-10-12 |  remotelocation_80
      248 |    2021-10-20 | remotelocation_248
      248 |    2021-10-21 | remotelocation_248
      248 |    2021-10-22 | remotelocation_248
      248 |    2021-10-25 | remotelocation_248
      234 |    2021-10-20 | remotelocation_234
      234 |    2021-10-21 | remotelocation_234
      234 |    2021-10-22 | remotelocation_234
      234 |    2021-10-25 | remotelocation_234
       11 |    2021-10-12 |  remotelocation_11
      501 |    2021-10-22 | remotelocation_501
        1 |    2021-10-12 |   remotelocation_1
      509 |    2021-10-22 | remotelocation_509
       78 |    2021-10-12 |  remotelocation_78
       96 |    2021-10-12 |  remotelocation_96
      539 |    2021-10-22 | remotelocation_539

I want to get last N records basis the activity_time. I read following:

Cassandra + Fetch the last records using in query

Error creating table in cassandra - Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directiv

Order latest records by timestamp in Cassandra

However, I am seeing that it requires some sort of where clause to get results from order by.

I just want to do something like this:

select * from table_name order by activity_time desc limit 20;

Any help is greatly appreciated. Thanks in advance.

Aaron
  • 55,518
  • 11
  • 116
  • 132
harsh solanki
  • 47
  • 1
  • 5

1 Answers1

1
select * from table_name order by activity_time desc limit 20;

So this is known as an "unbound" query (SELECT without WHERE). This is a known Cassandra anti-pattern, because without a WHERE clause filtering on the partition key, every node in the cluster will be contacted. In a large cluster scenario, that could mean > 100 nodes. Also, one node needs to prepare and return the results (known as a "coordinator") and coordinator nodes have crashed from queries like this.

With the current table structure, the best you could do is to query the last N records for an individual image_id.

To answer your actual question, you would first need to duplicate your data into a table designed to support the query. As you mainly care about recent data, it might make sense to use a time "bucket" to partition your data.

The actual time unit chosen as the "bucket" will vary by your business requirements. Given the data set shown above, I'll use "month" for this example.

CREATE TABLE stackoverflow.images_by_month (
    month int,
    activity_time date,
    image_id int,
    image_location text,
    PRIMARY KEY (month, activity_time, image_id)
) WITH CLUSTERING ORDER BY (activity_time DESC, image_id ASC)

Now I can do something like this:

> SELECT * FROm images_by_month WHERE month=202110 LIMIT 10;

 month  | activity_time | image_id | image_location
--------+---------------+----------+--------------------
 202110 |    2021-10-25 |      215 | remotelocation_215
 202110 |    2021-10-25 |      234 | remotelocation_234
 202110 |    2021-10-25 |      243 | remotelocation_243
 202110 |    2021-10-25 |      248 | remotelocation_248
 202110 |    2021-10-22 |      215 | remotelocation_215
 202110 |    2021-10-22 |      234 | remotelocation_234
 202110 |    2021-10-22 |      243 | remotelocation_243
 202110 |    2021-10-22 |      248 | remotelocation_248
 202110 |    2021-10-21 |      215 | remotelocation_215
 202110 |    2021-10-21 |      234 | remotelocation_234

Notice that I'm already specifying the CLUSTERING ORDER on my table, so I don't need to bother with it in the query. The data just comes off of the disk in that order.

If month creates too many rows per partition, maybe try a time unit like week or even day.

Aaron
  • 55,518
  • 11
  • 116
  • 132