PostgreSQL query and table structure improve performance

Question

I have a table daily_data to save entries, sales, hours(can be 100 types) of stores. My plan is to calculate conversion, entries/hour, sales/hour, ... of many stores(can be 1000 stores).
It can be data of 1 day, 2 days or months, years.

id_pos	date	data_type	value
1	2021-11-01	1	100
1	2021-11-01	2	20
1	2021-11-01	3	8
2	2021-11-02	1	50
2	2021-11-02	2	10
2	2021-11-02	3	8
...	...	...	...

data_type column explanation:

entries
sales
hours

*id_pos: ID of store location

My query:

select id_pos,data_type, sum(value) as data
from daily_data 
where id_pos IN (1,2) AND date>='2021-11-01' AND date<='2021-11-30' 
group by id_pos,data_type

After execute above query I handle result in coding to calculate Conversion(sales/entries), Sales per hour, Entries per hour, ... (handle formula with many exception cases).

| id_pos | data_type        | value |
| 1      | conversion       | 20%   |
| 1      | entries per hour | 20.1  |
| 1      | sales per hour   | 5.6   |

The question:
This structure will be enough with small data, but when I have 10-20 mil of rows then the query performance will be not good!
So my plan is to make a new "total_data" table, to save data of conversion, entries per hour, sale per hour(after execute above query). by this way I can directly select the final data without handle any formula in coding.

id_pos	date_begin	date_end	data_type	value
1	2021-11-01	2021-11-30	conversion	10
1	2021-11-01	2021-11-30	entries/hour	20.1
1	2021-11-01	2021-11-30	sales/hour	5.1
2	2021-11-01	2021-11-30	conversion	22
2	2021-11-01	2021-11-30	entries/hour	6.2
2	2021-11-01	2021-11-30	sales/hour	6.5

But the problem is the the date range, its can be 2021-11-02 to 2021-11-05 or thousand of cases. How can I make a properly "data_total" table?

"the problem is the the date range, its can be 2021-11-02 to 2021-11-05 or thousand of cases" => What is the logic to generate automatically the date ranges ? We need to know the logic to be implemented. — Edouard, Dec 02 '21 at 09:37
hi @Edouard : the date range is from client or analysis expert side, they can choose any date that they want. — sontd, Dec 02 '21 at 09:40
before storing data in the new total_data table, you need first to create a function with the expected date range as input parameters, and the result of the query as output parameter. Then the function result can be stored in the table. — Edouard, Dec 02 '21 at 09:45
@Edouard : yes I did this step already, just got stuck with the final step. — sontd, Dec 02 '21 at 10:02
@a_horse_with_no_name : I tried with all index possible, it's slow down the query when the data so big. anyway I have to apply ML, DL for it, so I have to find a solution for "total" daily data — sontd, Dec 02 '21 at 10:06
Please **[edit]** your question and add the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting) and make sure you preserve the indention of the plan. Paste the text, then put `\`\`\`` on the line before the plan and on a line after the plan. Please also include the definition of all indexes you have (or tried) — , Dec 02 '21 at 10:25

Laurenz Albe · Answer 1 · 2021-12-02T09:58:32.180

0

Assuming that you always want the values 1 and 2 for id_pos, you could define a partial index to speed up your query:

CREATE INDEX ON daily_data ("date") WHERE id_pos IN (1, 2);

If the values for id_pos are variable, your best option is to have two indexes, one on id_pos and one on "date".

edited Dec 02 '21 at 09:58

answered Dec 02 '21 at 09:06

Laurenz Albe

209,280
17
206
263

thanks you, unfortunately I have thousand of value for id_pos. its dynamic value – sontd Dec 02 '21 at 09:12

PostgreSQL query and table structure improve performance

1 Answers1