3

I am having issues querying large volumes of data by a single day. I am looking for advice on creating an efficient table schema.

Table: eventlog

Columns: recordid (UUID), insertedtimestamp (timestamp), source (Text), event (Text)

If I simply do:

CREATE TABLE eventlog (
    recordid uuid PRIMARY KEY,
    insertedtimestamp timestamp,
    source text,
    event text
); 

Then the below query will get overwhelmed by the volume of data, assuming today is 1/25.

select * from eventlog where insertedtimestamp > '2017-01-25';

The goal is to select all the records from a single day, knowing we need to be efficient in partitioning using tables with possibly millions of records. How would I design an efficient table schema (What partition key setup)? Thank you.

user2494663
  • 179
  • 1
  • 13

1 Answers1

4

Though you want to get all the record in a single day, you can use this schema

CREATE TABLE eventlog (
    day int,
    month int,
    year int,
    recordid uuid,
    insertedtimestamp timestamp,
    source text,
    event text,
    PRIMARY KEY((day,month,year),recordid)
); 

So all of the data in a single day, will be in a single node. Now you can get data of a date say 2017-01-25 more efficiently with the below query

SELECT* FROM eventlog WHERE day = 25 and month = 1 and year = 2017 
Ashraful Islam
  • 12,470
  • 3
  • 32
  • 53
  • Hello, is separating the date this way more efficient then just a date without a timestamp? Essentially, WHERE day = 25 and month = 1 and year = 2017 vs. WHERE date = '2017-01-25' (date is a new timestamp that just stores date, no time) – user2494663 Jan 25 '17 at 17:44
  • you can also use a single field of date type, that contains only date instead of three field – Ashraful Islam Jan 25 '17 at 18:34
  • 1
    I am using a single field of date type, containing only the date. Your solution still stands as correct – user2494663 Jan 25 '17 at 18:56