0

I am looking for a query to do following transformation.

Basically I want to find top 3 frequent SELL_COUNTRY and top 3 frequent category, on per website, per day bases. (for example, website 1, date 6-5-2017, there are 2*US, 1*JP and 1*UK for SELL_COUNTRY, therefore TOP1_SELL_COUNTRY is US, and JP and UK going to TOP2_SELL_COUNTRY and TOP3_SELL_COUNTRY. Same idea for CATEGORY column)

My current solution involves many subqueries, which works, but I feel it is too complicated. I am interested in how sql master would do it in an elegant way.

Currently I know how to do it uses

From enter image description here

To enter image description here

user1269298
  • 717
  • 2
  • 8
  • 26

2 Answers2

1

I would do that in 3 steps:

  1. group by country and rank by count
  2. group by category and rank by count
  3. blend results using conditional aggregate (which will just place the values in the necessary cells because the result of the CASE would be just your value and many NULL values, so min() outputs the value)

Like this:

WITH
countries as (
    SELECT *, row_number() over (partition by website,date order by count desc)
    FROM (
        SELECT
         website
        ,date::date
        ,sell_country
        ,count(1)
        FROM your_table
        GROUP BY 1,2,3
    )
)
,categories as (
    SELECT *, row_number() over (partition by website,date order by count desc)
    FROM (
        SELECT
         website
        ,date::date
        ,category
        ,count(1)
        FROM your_table
        GROUP BY 1,2,3
    )
)
SELECT
 website
,date
,coalesce(min(case when t1.row_number=1 then t1.sell_country end),'NA') as top1_sell_country
,coalesce(min(case when t1.row_number=2 then t1.sell_country end),'NA') as top2_sell_country
,coalesce(min(case when t1.row_number=3 then t1.sell_country end),'NA') as top3_sell_country
,coalesce(min(case when t2.row_number=1 then t2.category end),'NA') as top1_sell_category
,coalesce(min(case when t2.row_number=2 then t2.category end),'NA') as top2_sell_category
,coalesce(min(case when t2.row_number=3 then t2.category end),'NA') as top3_sell_category
FROM countries t1 
FULL JOIN categories t2
USING (website,date)
GROUP BY 1,2
ORDER BY 1,2
AlexYes
  • 4,088
  • 2
  • 15
  • 23
  • Did not realize min() can be used on string. Good to know! Never use USING clause before. Is it equal to 'WHERE t1.website = t2.website AND t1.date = t2.date' ? – user1269298 Jul 28 '17 at 17:10
  • Seems need a minor change. 'sell_country' needs to be removed from 'SELECT *, row_number() over (partition by website,date,sell_country order by count desc)' – user1269298 Jul 28 '17 at 17:46
  • @user1269298 absolutely! as well as `category` from the second part, sorry I didn't test that:) just written out of my head. And yes, `USING` is equivalent to longer syntax of join condition and it's allowing to use `select *` without being afraid of having duplicated join columns – AlexYes Jul 28 '17 at 19:42
0
WITH a1 AS
(
SELECT *,
       COUNT(*) OVER( PARTITION BY website,SUBSTRING(visit_date,1,8),sell_country ) AS sell_cntry,
       COUNT(*) OVER( PARTITION BY website,SUBSTRING(visit_date,1,8),pur_country ) AS pur_cntry
  FROM Yourtable
),
a2 AS 
(
SELECT website,
       visit_date,
       sell_country,
       RANK() OVER ( PARTITION BY website,SUBSTRING(visit_date,1,8) ORDER BY sell_cntry DESC ) AS sell_cntry_rnk
  FROM a1
),
a3 AS
(
SELECT website,
       visit_date,
       pur_country,
       RANK() OVER ( PARTITION BY website,SUBSTRING(visit_date,1,8) ORDER BY pur_cntry DESC ) AS pur_cntry_rnk
  FROM a1
),
a4 AS
(
SELECT a2.website AS company,
       a2.v_date,
       CASE WHEN a2.sell_cntry_rn = 1 THEN a2.sell_country  END  AS TOP1_SELL_COUNTRY,
       CASE WHEN a2.sell_cntry_rn = 2 THEN a2.sell_country  END  AS TOP2_SELL_COUNTRY,
       CASE WHEN a2.sell_cntry_rn = 3 THEN a2.sell_country  END  AS TOP3_SELL_COUNTRY,
       CASE WHEN a3.pur_cntry_rn = 1 THEN a3.pur_country  END  AS TOP1_PUR_COUNTRY,
       CASE WHEN a3.pur_cntry_rn = 2 THEN a3.pur_country  END  AS TOP2_PUR_COUNTRY,
       CASE WHEN a3.pur_cntry_rn = 3 THEN a3.pur_country  END  AS TOP3_PUR_COUNTRY 
  FROM (
          SELECT Z.*,
                 ROW_NUMBER() OVER( PARTITION BY website,v_date ORDER BY sell_cntry_rnk,sell_country ) AS sell_cntry_rn
            FROM 
          (
            SELECT DISTINCT website,
                            SUBSTRING(visit_date,1,8) AS v_date,
                            sell_cntry_rnk,
                            sell_country
              FROM a2 
            ) Z
          WHERE Z.sell_cntry_rnk <= 3 
        ) a2
 INNER JOIN
       (
         SELECT *,
                ROW_NUMBER() OVER( PARTITION BY website,v_date ORDER BY pur_cntry_rnk,pur_country ) AS pur_cntry_rn
           FROM
          ( SELECT DISTINCT website,
                            SUBSTRING(visit_date,1,8) AS v_date,
                            pur_cntry_rnk,
                            pur_country
              FROM a3
           ) Z
          WHERE Z.pur_cntry_rnk <= 3 
       ) a3 
    ON a2.website =  a3.website
   AND a2.v_date = a3.v_date
),
a5 AS
(
SELECT company,
       v_date,
       MAX(TOP1_SELL_COUNTRY) AS TOP1_SELL_COUNTRY,
       MAX(TOP2_SELL_COUNTRY) AS TOP2_SELL_COUNTRY,
       MAX(TOP3_SELL_COUNTRY) AS TOP3_SELL_COUNTRY,
       MAX(TOP1_PUR_COUNTRY) AS TOP1_PUR_COUNTRY,
       MAX(TOP2_PUR_COUNTRY) AS TOP2_PUR_COUNTRY,
       MAX(TOP3_PUR_COUNTRY) AS TOP3_PUR_COUNTRY
  FROM a4
GROUP BY company,
         v_date
)
SELECT company,
       v_date,
       CASE WHEN TOP1_SELL_COUNTRY IS NULL THEN 'NA' ELSE TOP1_SELL_COUNTRY END AS TOP1_SELL_COUNTRY,
       CASE WHEN TOP2_SELL_COUNTRY IS NULL THEN 'NA' ELSE TOP2_SELL_COUNTRY END AS TOP2_SELL_COUNTRY,
       CASE WHEN TOP3_SELL_COUNTRY IS NULL THEN 'NA' ELSE TOP3_SELL_COUNTRY END AS TOP3_SELL_COUNTRY,
       CASE WHEN TOP1_PUR_COUNTRY IS NULL THEN 'NA' ELSE TOP1_PUR_COUNTRY END AS TOP1_PUR_COUNTRY,
       CASE WHEN TOP2_PUR_COUNTRY IS NULL THEN 'NA' ELSE TOP2_PUR_COUNTRY END AS TOP2_PUR_COUNTRY,
       CASE WHEN TOP3_PUR_COUNTRY IS NULL THEN 'NA' ELSE TOP3_PUR_COUNTRY END AS TOP3_PUR_COUNTRY
  FROM a5
 ORDER BY company,v_date;
Teja
  • 13,214
  • 36
  • 93
  • 155