1

Can someone help me with this SQL?

I am looking to pivot multiple rows into columns by: id, type, color, date

Challenge: I have multiple columns I want to preserve/be shown on the pivoted table.

id | type   | color  | date      | country_code | cost
---+--------+--------+-----------+--------------+-------
1  | report | red    | 2020-09-01| US           | 1
1  | report | red    | 2020-09-01| EU           | 2
1  | report | red    | 2020-09-01| RU           | 3
1  | report | red    | 2020-09-01| AP           | 4
1  | report | blue   | 2020-09-02| US           | 5   
1  | report | blue   | 2020-09-02| EU           | 6
1  | report | blue   | 2020-09-02| RU           | 7
1  | report | blue   | 2020-09-02| AP           | 8
2  | report | green  | 2020-09-02| US           | 9
2  | report | green  | 2020-09-02| EU           | 10
2  | report | green  | 2020-09-02| RU           | 11
2  | report | green  | 2020-09-02| AP           | 12
2  | report | blue   | 2020-09-03| US           | 13
2  | report | blue   | 2020-09-03| EU           | 14
2  | report | blue   | 2020-09-03| RU           | 15
2  | report | blue   | 2020-09-03| AP           | 16

Desired output:

id | type   | color | date       | US | EU | RU | AP
---+--------+-------+------------+----+----+----+----
1  | report | red   | 2020-09-01 | 1  | 2  | 3  | 4
1  | report | blue  | 2020-09-02 | 5  | 6  | 7  | 8
2  | report | green | 2020-09-02 | 9  | 10 | 11 | 12
2  | report | blue  | 2020-09-03 | 13 | 14 | 15 | 16

Known information:

  • there are only 4 country codes.
  • the color value will be same per: id, type, date.

Not sure what is the most cleanest/best way to write this SQL.

I have tried to use

ROW_NUMBER() OVER(PARTITION BY xxx ORDER BY yyy)

and PIVOT but couldn't get the results I was looking for

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Jacky Lam
  • 67
  • 1
  • 2
  • 7

2 Answers2

3

I think conditional aggregation does what you want:

select id, type, color, date,
       max(case when country_code = 'US' then cost end) as us,
       max(case when country_code = 'EU' then cost end) as eu,
       max(case when country_code = 'RU' then cost end) as ru,
       max(case when country_code = 'AP' then cost end) as AP
from t
group by id, type, color, date;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Sorry, i accidentally pasted the US value twice. I corrected it. So country_code should be 'US, EU, RU, AP'. – Jacky Lam Jun 13 '20 at 22:51
  • I played around and think I got this instead: ``` SELECT id, type, color, date, [US], [EU], [RU], [AP] FROM ( SELECT id, type, color, date, country_code, cost FROM tableX ) t PIVOT ( MAX(cost) FOR country_code IN ( [US], [EU], [RU], [AP] ) ) p ``` – Jacky Lam Jun 13 '20 at 22:57
  • @JackyLam . . . Then conditional aggregation is even simpler. – Gordon Linoff Jun 13 '20 at 22:58
  • ah yes! silly me, trying really complicated queries when it was pretty straightforward. Thank you soo much Gordon!! Cheeky question: in a very large dataset (over millions) do you happen to know if the performance is better using `group by` or `pivot`? (assuming i've indexed the columns correctly) – Jacky Lam Jun 13 '20 at 23:38
  • My understanding is that `pivot` does not offer a performance advantage in any database that implements it. That said, optimizers change over time, so it is always worth testing. – Gordon Linoff Jun 14 '20 at 00:41
0

I played around and think I got this:

SELECT id, type, color, date, [US], [EU], [RU], [AP]
FROM
(
  SELECT id, type, color, date, country_code, cost FROM tableX
) t
PIVOT
(
  MAX(cost)
  FOR country_code IN ( [US], [EU], [RU], [AP] )
) p
Jacky Lam
  • 67
  • 1
  • 2
  • 7