9

I am working with some course data in a Presto database. The data in the table looks like:

student_id  period   score completed
1           2016_Q1  3     Y
1           2016_Q3  4     Y
3           2017_Q1  4     Y
4           2018_Q1  2     N

I would like to format the data so that it looks like:

student_id  2018_Q1_score 2018_Q1_completed 2017_Q3_score
1           0             N                 5
3           4             Y                 4
4           2             N                 2

I know that I could do this by joining to the table for each time period, but I wanted to ask here to see if any gurus had a recommendation for a more scalable solution (e.g. perhaps not having to manually create a new join for each period). Any suggestions?

aguadamuz
  • 321
  • 1
  • 3
  • 12
  • 1
    Are you looking for pivot table solution? – Prashant Pimpale Jul 17 '18 at 02:54
  • 1
    Prashant...after I wrote this I realized that I should be able to use a to_map() function in Presto to achieve my desired output as my need does appear to call for a pivot solution. Even using a pivot solution, I am not sure how I would get the column names into my desired output. – aguadamuz Jul 17 '18 at 03:02

1 Answers1

8

You can just use conditional aggregation:

select student_id,
       max(case when period = '2018_Q1' then score else 0 end) as score_2018q1,
       max(case when period = '2018_Q1' then completed then 'N' end) as completed_2018q1,
       max(case when period = '2017_Q3' then score else 0 end) as score_2017q3
from t
group by student_id
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    I would suggest `max(score) filter(where period = '2018_Q1')` over `max(case when period = '2018_Q1' then score else 0 end)` – Piotr Findeisen Jul 18 '18 at 06:30
  • 1
    @PiotrFindeisen . . . . Pretty much the same. If I use `filter`, then I need to introduce `coalesce()` for the scores. I do agree that `filter` is generally preferable. – Gordon Linoff Jul 18 '18 at 12:16
  • 3
    Is there a way to achieve this functionality in an automated way, instead of needing to manually create conditional logic for all desired columns? – psrpsrpsr Dec 17 '20 at 18:49
  • 2
    @psrpsrpsr . . . You can construct the SQL query as a string and then run the query. An application language such as Python can be used. – Gordon Linoff Dec 17 '20 at 19:31
  • 1
    Is there a more automatic way of doing it instead of manually typing in dates? – Bananeen Oct 08 '22 at 23:38