Cassandra : Making an appropriate Data Model

Question

I have a table called Price in MYSQL which looks like this :

+---------+-------------+------+-----+-------------------+-----------------------------+
| Field   | Type        | Null | Key | Default           | Extra                       |
+---------+-------------+------+-----+-------------------+-----------------------------+
| Current | float(20,3) | YES  |     | NULL              |                             |
| Time    | timestamp   | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+---------+-------------+------+-----+-------------------+-----------------------------+

My application requires me to sum and retrieve results from the last 1 hour, 2 hours up to the last week from now. I am trying to move to Cassandra and wanted to make a suitable model for my data. Currently i have built a table in Cassandra which looks something like this :

CREATE TABLE IF NOT EXISTS HAS.Price (
    ID INT,
    Current float,
    Time timestamp,
    Time_uuid timeuuid,
    PRIMARY KEY (ID, Time_uuid)
);

This is not logical as it just creates one big table and i dont think this will distribute data to other nodes. I am using a fixed id of 1 here. I believe in my case the logical partition key to choose would be "hour" so for example i can sum all the current values from last hour, last 2 hours and so on. In this case i am referring to this post . If i create hour as a partition key for example all the data for lets say the 15th hour of the day will go in this row

2015-08-06 15:00:00

and the data for the next hour will go to 2015-08-06 16:00:00. However lets say the current time is 2015-08-06 16:12:43 and i want to select records from last hour how will my query look like because part of the data is in 2015-08-06 15:00:00 which will have a different primary key

How many values are we talking about? 100/h? 10k? 100m? – ClojureMostly Aug 12 '15 at 17:33 — ClojureMostly, Aug 12 '15 at 17:33

Aravind Chamakura · Accepted Answer · 2015-08-13T22:32:14.800

Try the following option. ( I have correct the answer)

Design for your queries. Here, possible queries I could see other than upto minute

Get sum for day
Get sum for hour

Get sum for last hour (any time on the hour)

CREATE TABLE mykeyspace.price (
day text,
hour text,
inserttime timeuuid,
current float,
PRIMARY KEY ((day, hour), inserttime)
) WITH CLUSTERING ORDER BY (inserttime DESC)

Make 2 insert for every transaction like below

    insert into price (day, hour  , inserttime , current ) VALUES    ('20150813','',now(),2.00)
    insert into price (day, hour  , inserttime , current ) VALUES         ('',’ 2015081317',now(),2.00)

Where

day is YYYYMMDD
hour YYYYMMDDhhmmss (2015081317)

Select Query to get last hour at any minute: Use minTimeuuid and maxTimeuuid

    select day,hour,dateOf(inserttime) from price where day = 0 and hour IN ( 2015081317, 2015081316) and inserttime > maxTimeuuid('2015-08-13 16:20:00-0500') and inserttime < minTimeuuid('2015-08-13 17:20:00-0500');

Note: Range query is not allowed on a partition key, although documentation says you could use token function but the results are not predictable.

Aftab · Answer 2 · 2015-08-18T07:27:57.987

This is not logical as it just creates one big table and i dont think this will distribute data to other nodes.

Yes, this won't distribute data across you nodes.

Here what I think solution should be

CREATE TABLE IF NOT EXISTS HAS.Price (
    Time_uuid timeuuid,
    Current float,
    PRIMARY KEY (Time_uuid)
);

Then simply find start hour time_uuid and end hour time_uuid and write query like

`SELECT * FROM HAS.Price WHERE  time_uuid>=cdb36860-4444-11e5-8080-808080808080 AND time_uuid<=f784b8ef-450d-11e5-7f7f-7f7f7f7f7f7f`

Cassandra : Making an appropriate Data Model

2 Answers2