1

I've been working on a data model for an analytics system for a little while now, but I just can't seem to get the right setup for my primary keys. I've watched a bunch of videos (https://www.youtube.com/watch?v=UP74jC1kM3w&list=PLqcm6qE9lgKJoSWKYWHWhrVupRbS8mmDA&index=9) to gain some learnings about best practises, especially concerning time-series data.

Regarding PRIMARY KEYS, I can't seem to get the right balance so that I can query the way I need to.

Here is my data model so far:

CREATE TABLE eventPropertyCountsByDay (
    user_id int,
    event_type varchar,
    property varchar,
    value varchar,
    date_to_day varchar,
    count counter,
    PRIMARY KEY ((event_type, user_id), date_to_day, property, value)
) WITH CLUSTERING ORDER BY (date_to_day DESC, property DESC, value DESC);

I am storing events in another table and event properties in this table (column family).

I need to be able to query according to the user id, with an IN query to get records for multiple users at once, but I also need to query on the property and value fields, as well as specifying a date range.

Here is an example of the query I'm trying to achieve:

SELECT * FROM eventPropertyCountsByWeek 
WHERE event_type = 'some_event' 
AND date_to_day > '2014-09-24'
AND user_id IN (123, 456) 
AND property = 'property_name' 
AND value = 'property_value'

How can I accomplish this sort of query? What kind of other column families might I need to introduce to break it down?

Sneaksta
  • 1,041
  • 4
  • 24
  • 46
  • Before we get into exactly how many tables you need, we should talk about the queries first. What are you trying to support on the application side? You have counts by day and week. Are those typical or the only queries you are trying to support? – Patrick McFadin Sep 24 '14 at 23:22
  • @PatrickMcFadin hey mate! I've whipped up a quick google doc for my answer. https://docs.google.com/document/d/16EH8Qpsm315zK-cJatR_VUAzVf7TdjFIDpLithw9Eto/edit?usp=sharing What's your email? – Sneaksta Sep 24 '14 at 23:44

1 Answers1

2

Try this:

CREATE TABLE eventPropertyCountsByDay (
  user_id int,
  event_type varchar,
  property varchar,
  value varchar,
  date_to_day int, // day number
  count counter,
  PRIMARY KEY ((event_type, user_id), property, value, date_to_day)
) WITH CLUSTERING ORDER BY (property DESC, value DESC, date_to_day DESC);

I moved date_to_day at the end of clustering key to make it usable for range queries with fixed property and value.

Data updating query:

update eventPropertyCountsByDay set count = count + 1 where 
  user_id=1 and 
  event_type='log' and 
  property='prop1' and 
  value='val1' and 
  date_to_day=54321;

Select query:

select * from eventPropertyCountsByDay 
  where event_type='log' and 
    user_id=1 and 
    property='prop1' and
    value='val1' and 
    date_to_day > 54300;

 event_type | user_id | property | value | date_to_day | count
------------+---------+----------+-------+-------------+-------
        log |       1 |    prop1 |  val1 |       54323 |     2
        log |       1 |    prop1 |  val1 |       54321 |     1
shutty
  • 3,298
  • 16
  • 27