0

I have a table daily_data to save entries, sales, hours(can be 100 types) of stores. My plan is to calculate conversion, entries/hour, sales/hour, ... of many stores(can be 1000 stores).
It can be data of 1 day, 2 days or months, years.

id_pos date data_type value
1 2021-11-01 1 100
1 2021-11-01 2 20
1 2021-11-01 3 8
2 2021-11-02 1 50
2 2021-11-02 2 10
2 2021-11-02 3 8
... ... ... ...

data_type column explanation:

  1. entries
  2. sales
  3. hours

*id_pos: ID of store location

My query:

select id_pos,data_type, sum(value) as data
from daily_data 
where id_pos IN (1,2) AND date>='2021-11-01' AND date<='2021-11-30' 
group by id_pos,data_type

After execute above query I handle result in coding to calculate Conversion(sales/entries), Sales per hour, Entries per hour, ... (handle formula with many exception cases).

| id_pos | data_type        | value |
| 1      | conversion       | 20%   |
| 1      | entries per hour | 20.1  |
| 1      | sales per hour   | 5.6   |

The question:
This structure will be enough with small data, but when I have 10-20 mil of rows then the query performance will be not good!
So my plan is to make a new "total_data" table, to save data of conversion, entries per hour, sale per hour(after execute above query). by this way I can directly select the final data without handle any formula in coding.

id_pos date_begin date_end data_type value
1 2021-11-01 2021-11-30 conversion 10
1 2021-11-01 2021-11-30 entries/hour 20.1
1 2021-11-01 2021-11-30 sales/hour 5.1
2 2021-11-01 2021-11-30 conversion 22
2 2021-11-01 2021-11-30 entries/hour 6.2
2 2021-11-01 2021-11-30 sales/hour 6.5

But the problem is the the date range, its can be 2021-11-02 to 2021-11-05 or thousand of cases. How can I make a properly "data_total" table?

sontd
  • 397
  • 2
  • 4
  • 15
  • "the problem is the the date range, its can be 2021-11-02 to 2021-11-05 or thousand of cases" => What is the logic to generate automatically the date ranges ? We need to know the logic to be implemented. – Edouard Dec 02 '21 at 09:37
  • hi @Edouard : the date range is from client or analysis expert side, they can choose any date that they want. – sontd Dec 02 '21 at 09:40
  • before storing data in the new total_data table, you need first to create a function with the expected date range as input parameters, and the result of the query as output parameter. Then the function result can be stored in the table. – Edouard Dec 02 '21 at 09:45
  • Do you have an index on `(id_pos, date)`? –  Dec 02 '21 at 10:01
  • @Edouard : yes I did this step already, just got stuck with the final step. – sontd Dec 02 '21 at 10:02
  • @a_horse_with_no_name : I tried with all index possible, it's slow down the query when the data so big. anyway I have to apply ML, DL for it, so I have to find a solution for "total" daily data – sontd Dec 02 '21 at 10:06
  • 1
    Please **[edit]** your question and add the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting) and make sure you preserve the indention of the plan. Paste the text, then put `\`\`\`` on the line before the plan and on a line after the plan. Please also include the definition of all indexes you have (or tried) –  Dec 02 '21 at 10:25

1 Answers1

0

Assuming that you always want the values 1 and 2 for id_pos, you could define a partial index to speed up your query:

CREATE INDEX ON daily_data ("date") WHERE id_pos IN (1, 2);

If the values for id_pos are variable, your best option is to have two indexes, one on id_pos and one on "date".

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263