I've been working on a data model for an analytics system for a little while now, but I just can't seem to get the right setup for my primary keys. I've watched a bunch of videos (https://www.youtube.com/watch?v=UP74jC1kM3w&list=PLqcm6qE9lgKJoSWKYWHWhrVupRbS8mmDA&index=9) to gain some learnings about best practises, especially concerning time-series data.
Regarding PRIMARY KEYS, I can't seem to get the right balance so that I can query the way I need to.
Here is my data model so far:
CREATE TABLE eventPropertyCountsByDay (
user_id int,
event_type varchar,
property varchar,
value varchar,
date_to_day varchar,
count counter,
PRIMARY KEY ((event_type, user_id), date_to_day, property, value)
) WITH CLUSTERING ORDER BY (date_to_day DESC, property DESC, value DESC);
I am storing events in another table and event properties in this table (column family).
I need to be able to query according to the user id, with an IN query to get records for multiple users at once, but I also need to query on the property and value fields, as well as specifying a date range.
Here is an example of the query I'm trying to achieve:
SELECT * FROM eventPropertyCountsByWeek
WHERE event_type = 'some_event'
AND date_to_day > '2014-09-24'
AND user_id IN (123, 456)
AND property = 'property_name'
AND value = 'property_value'
How can I accomplish this sort of query? What kind of other column families might I need to introduce to break it down?