I have a database in postgresql with product_id, store_id, week, price_avg, price_min, price_max
and I want to select a set of homogeneous products that it is present in a set of stores and in the same weeks.
I have tried some approaches but it´s difficult since I have 22 million of prices, 2645 different stores and data for 13 weeks.
The goal is to select a set of 20 product_ids that are present in all 13 weeks of information and that they appear in a set of 10 stores all the time.
First I thought on splitting the database in tables per store and then do an inner join with product_id. But that's costly for SQL.
Second, I took first two weeks in a table. and I run a
select
count() as quantity, f2.product, f2.key_all_stores
from
database f2
group by
product_id, key_all_stores
having
count() = 2
The having count 2 keeps those products in both week. and then with those products selected I run the following:
select
count(), product_id
from
(select
count() as quantity, f2.product_id, f2.clave_todo_junto
from
fer f2
group by
producto, key_all_stores
having
count() = 2) as subquery
group by
subquery.product_id
having
count() = 4
The issue with these two queries is that I am looking at two weeks and doing a sample with 4 different stores.