-2

I have a database in postgresql with product_id, store_id, week, price_avg, price_min, price_max and I want to select a set of homogeneous products that it is present in a set of stores and in the same weeks.

I have tried some approaches but it´s difficult since I have 22 million of prices, 2645 different stores and data for 13 weeks.

The goal is to select a set of 20 product_ids that are present in all 13 weeks of information and that they appear in a set of 10 stores all the time.

First I thought on splitting the database in tables per store and then do an inner join with product_id. But that's costly for SQL.

Second, I took first two weeks in a table. and I run a

select 
    count() as quantity, f2.product, f2.key_all_stores 
from 
    database f2 
group by 
    product_id, key_all_stores 
having 
    count() = 2 

The having count 2 keeps those products in both week. and then with those products selected I run the following:

select 
    count(), product_id 
from 
    (select 
         count() as quantity, f2.product_id, f2.clave_todo_junto 
     from 
         fer f2 
     group by 
         producto, key_all_stores 
     having 
         count() = 2) as subquery 
group by 
    subquery.product_id 
having 
    count() = 4 

The issue with these two queries is that I am looking at two weeks and doing a sample with 4 different stores.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ferchu109
  • 1
  • 3
  • 6
    Edit your question and provide sample data and desired results. – Gordon Linoff Sep 26 '18 at 16:40
  • 4
    Looks like homework... What did you try so far? – Robert Kock Sep 26 '18 at 16:43
  • 3
    Don't tell in comment. Update your question with your queries.. Better provide some data and desired result for better answer – dwir182 Sep 26 '18 at 17:11
  • Hi. Please read & act on [mcve]. Including: Show relevant queries you can do. Develop/debug with representative data. Show that your program calculates what you expect it to as it goes through (sub)expressions. PS "present in all" suggests relational division. Read about it & expressing it in SQL. PS What do you mean "appear all the time"? "I am looking at ..." does not elicidate/explain "The issue with these two queries". Please use enough words, phrases & sentences & references to parts of examples to be clear. – philipxy Sep 26 '18 at 19:37

1 Answers1

0

I have a database in postgresql with product_id, store_id, week, price_avg, price_min, price_max … The Goal is to select a set of 20 product_ids that are present in all 13 weeks of information and that they appear in a set of 10 stores all the time.

If all of these columns are still inside a single table, as stated, you may try the following shortcut :

WITH shortlist(product_id,store_id,week) AS
(
    SELECT DISTINCT product_id,store_id,week
               FROM yourtable
              WHERE store_id IN (store1,store2,store3,store4)
                AND week     IN (week1,week2,week3,…)
)

  SELECT product_id,count(*)
    FROM shortlist
GROUP BY product_id
  HAVING count(*) >= 130

You may dismiss the AND row if you're sure that your table contains nothing else than the 13 quoted weeks. Specify the requested store ids the upper one and replace yourtable by the actual name.

The point here is that making a DISTINCT pre-selection first with only concerned fields will get rid of every duplicated row if you have multiple data regarding price for a given price-store-week combination.

Given this, a product that appear all of these 13 weeks in all 10 different stores should produce exactly 130 entries. Thus the final count.

Obsidian
  • 3,719
  • 8
  • 17
  • 30
  • I have checked and I don´t have such a product that is present in all stores. I cannot do it with a multiplication of week times stores. – ferchu109 Nov 14 '18 at 14:06