1

I have the following situations: I need to redesign a Cassandra keyspace by restructuring multiple tables.

The existing structure is the following

token@cqlsh> describe series;

CREATE TABLE series (
    type text,
    name text,
    as_of timestamp,
    data text,
    hash text,
    PRIMARY KEY ((type, name, as_of))
)

Instead of having the following structure:

 type | name              | as_of                           | data                                                                                                                                                                                                                                                                                                                                                                                                                         | hash
------+-------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------
   OP | LU_STC            | 2022-09-30 00:00:00.000000+0000 | {"type": "OP", "name": "LU_STC", "as_of": "2022-09-30", "data": [{"year": 2022, "month": 9, "day": 30, "hour": 0, "quarter": 3, "week": 39, "wk_year": 2022, "is_peak": 0, "value": 4.689399994443761}, {"year": 2022, "month": 9, "day": 30, "hour": 1, "quarter": 3, "week": 39, "wk_year": 2022, "is_peak": 0, "value": 12.761399943614606}], "hash": "cf4e383a370416ebbdec8a2f2ca4e28982cd9f871a9f540e4f4f6c8807a7015a"} | cf4e383a370416ebbdec8a2f2ca4e28982cd9f871a9f540e4f4f6c8807a7015a

where the data field contains more than 50k entries:

{
   "type":"OP",
   "name":"LU_STC",
   "as_of":"2022-09-30",
   "data":[
      {
         "year":2022,
         "month":9,
         "day":30,
         "hour":0,
         "quarter":3,
         "week":39,
         "wk_year":2022,
         "is_peak":0,
         "value":4.689399994443761
      },
      ...
      50k entries
      ...
      {
         "year":2022,
         "month":9,
         "day":30,
         "hour":1,
         "quarter":3,
         "week":39,
         "wk_year":2022,
         "is_peak":0,
         "value":12.761399943614606
      }
   ],
   "hash":"cf4e383a370416ebbdec8a2f2ca4e28982cd9f871a9f540e4f4f6c8807a7015a"
}

a table (series_test) where each data field is written like a row, having the following structure:

token@cqlsh> select * series_test where name = 'DE_STC' and as_of = '2022-09-30' and date <= '2022-10-01 00:00:00.000000+0000' ALLOW FILTERING;

 name   | as_of                           | date                            | commodity | type | is_peak | quarter | value   | week | wk_year
--------+---------------------------------+---------------------------------+-----------+------+---------+---------+---------+------+---------
 DE_STC | 2022-09-30 00:00:00.000000+0000 | 2022-09-30 00:00:00.000000+0000 |     power |   OP |   False |       3 | -1.0377 |   39 |    2022
 DE_STC | 2022-09-30 00:00:00.000000+0000 | 2022-10-01 00:00:00.000000+0000 |     power |   OP |   False |       4 | 31.0728 |   39 |    2022

The question is how I can build the partition key to be unique and to be able to query for sets of rows based on the date column (timestamp) for getting records only for a given period?

Is the following structure correct ?

token@cqlsh> describe series_test;

CREATE TABLE series_test (
    name text,
    as_of timestamp,
    date timestamp,
    commodity text,
    type text,
    is_peak boolean,
    quarter int,
    value float,
    week int,
    wk_year int,
    PRIMARY KEY ((name, as_of, date), commodity, type)
)
R13mus
  • 752
  • 11
  • 20

0 Answers0