0

I have 100 000 sources/sensors/symbols. From each one I read a value on a daily basis.

I load a database table with these daily data/values. One row for each value AND symbol, thats 100 000 rows per day.

Query1:

As I want to analyze daily data over all symbols I partition the database on year, month, week, day.

Query2:

But I also want to analyze trends/statistics over time for each symbol by reading data up to 24 months back in time per symbol.

Query2 is along another dimension (time-axis) and I dont want to query over several partitions (y,m,w,d) optimized for query1.

What kind of design do you propose ?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

1 Answers1

2

100k rows per day isn't too much. you create only one partition for a year. but if you also want to optimize query performance by only working on single partitions - let's assume you partition by month.

furthermore let's assume a three node cluster. so you probably want to create the table with 6 shards, that would yield in a total of 6*12 shards per year. if you have one replica configured - 6*12*2 shards - which looks good.

if you specify the partition in the where clause - crate will automatically only wake up these shards. but you still have the whole data visible.

BTW: here's a handy link to partitioning by time: https://crate.io/a/generated-columns-in-crate/

Jodok Batlogg
  • 373
  • 1
  • 9