I have the following situations: I need to redesign a Cassandra keyspace by restructuring multiple tables.
The existing structure is the following
token@cqlsh> describe series;
CREATE TABLE series (
type text,
name text,
as_of timestamp,
data text,
hash text,
PRIMARY KEY ((type, name, as_of))
)
Instead of having the following structure:
type | name | as_of | data | hash
------+-------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------
OP | LU_STC | 2022-09-30 00:00:00.000000+0000 | {"type": "OP", "name": "LU_STC", "as_of": "2022-09-30", "data": [{"year": 2022, "month": 9, "day": 30, "hour": 0, "quarter": 3, "week": 39, "wk_year": 2022, "is_peak": 0, "value": 4.689399994443761}, {"year": 2022, "month": 9, "day": 30, "hour": 1, "quarter": 3, "week": 39, "wk_year": 2022, "is_peak": 0, "value": 12.761399943614606}], "hash": "cf4e383a370416ebbdec8a2f2ca4e28982cd9f871a9f540e4f4f6c8807a7015a"} | cf4e383a370416ebbdec8a2f2ca4e28982cd9f871a9f540e4f4f6c8807a7015a
where the data field contains more than 50k entries:
{
"type":"OP",
"name":"LU_STC",
"as_of":"2022-09-30",
"data":[
{
"year":2022,
"month":9,
"day":30,
"hour":0,
"quarter":3,
"week":39,
"wk_year":2022,
"is_peak":0,
"value":4.689399994443761
},
...
50k entries
...
{
"year":2022,
"month":9,
"day":30,
"hour":1,
"quarter":3,
"week":39,
"wk_year":2022,
"is_peak":0,
"value":12.761399943614606
}
],
"hash":"cf4e383a370416ebbdec8a2f2ca4e28982cd9f871a9f540e4f4f6c8807a7015a"
}
a table (series_test) where each data field is written like a row, having the following structure:
token@cqlsh> select * series_test where name = 'DE_STC' and as_of = '2022-09-30' and date <= '2022-10-01 00:00:00.000000+0000' ALLOW FILTERING;
name | as_of | date | commodity | type | is_peak | quarter | value | week | wk_year
--------+---------------------------------+---------------------------------+-----------+------+---------+---------+---------+------+---------
DE_STC | 2022-09-30 00:00:00.000000+0000 | 2022-09-30 00:00:00.000000+0000 | power | OP | False | 3 | -1.0377 | 39 | 2022
DE_STC | 2022-09-30 00:00:00.000000+0000 | 2022-10-01 00:00:00.000000+0000 | power | OP | False | 4 | 31.0728 | 39 | 2022
The question is how I can build the partition key to be unique and to be able to query for sets of rows based on the date column (timestamp) for getting records only for a given period?
Is the following structure correct ?
token@cqlsh> describe series_test;
CREATE TABLE series_test (
name text,
as_of timestamp,
date timestamp,
commodity text,
type text,
is_peak boolean,
quarter int,
value float,
week int,
wk_year int,
PRIMARY KEY ((name, as_of, date), commodity, type)
)