5

I have a table that looks as follows: enter image description here

I need to unpivot the Rating and the Comments as follows: enter image description here

What is the best way to do this in Snowflake?

Note: there are some cells in the comment columns that are NULL

Adding details:

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
    select * from reviews;
    


select name, skill, skill_rating, comments
    from reviews
    unpivot(skill_rating for skill in (acting_rating,  comedy_rating)) 
    unpivot(comments for skill_comments in (acting_comments,comedy_comments)) 

--Following where clause is added to filter the irrelevant comments due to multiple unpivots

where substr(skill,1,position('_',skill)-1) = substr(skill_comments,1,position('_',skill_comments)-1) 
     order by name;

will produce produce the desired results, but with data that has NULLs, the unpivoted rows that have NULLs go missing from the output:

NAME    SKILL   SKILL_RATING    COMMENTS
abc COMEDY_RATING   1   NO
lmn ACTING_RATING   1   what
xyz ACTING_RATING   3   some
xyz COMEDY_RATING   1   haha
Saqib Ali
  • 3,953
  • 10
  • 55
  • 100

5 Answers5

6

If all you need to solve is for the table specified in the question - you can do it manually with a set of UNION ALL:

select NAME
  , 'ACTING_RATING' as SKILL, ACTING_RATING as SKILL_RATING, ACTING_COMMENTS as SKILL_COMMENTS
from DATA
union all
select NAME
  , 'COMEDY_RATING', COMEDY_RATING, COMEDY_COMMENTS
from DATA
union all
select NAME
  , 'MUSICAL_PERFORMANCE_RATING', MUSICAL_PERFORMANCE_RATING, MUSICAL_PERFORMANCE_COMMENTS
from DATA
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • 1
    Noob so can't upvote or comment on posts - but Felipe's answer with the simple UNION ALL is actually better imho as it is simpler and more effective. Both answers involve multiple aggregation streams, the unpivot also does additional steps - it looks cool... but its not as efficient! You can see this by running both back to back and looking at what snowflake does in the query profile. – Matt Prince Mar 02 '21 at 14:49
  • 1
    Thanks @MattPrince - btw, I wrote a dynamic table UDF for unpivot now too: https://stackoverflow.com/a/66431516/132438 (welcome to Stack Overflow!) – Felipe Hoffa Mar 02 '21 at 20:37
  • @FelipeHoffa I allowed myself to compare this approach vs `INSERT ALL` when the results is supposed to be saved in a table. – Lukasz Szozda May 08 '22 at 08:49
2

This is a basic script and should give the desired output

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, 'something', 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, 'hahaha');
    
    select * from reviews;
    


select name, skill, skill_rating, comments
    from reviews
    unpivot(skill_rating for skill in (acting_rating,  comedy_rating)) 
    unpivot(comments for skill_comments in (acting_comments,comedy_comments)) 

--Following where clause is added to filter the irrelevant comments due to multiple unpivots

where substr(skill,1,position('_',skill)-1) = substr(skill_comments,1,position('_',skill_comments)-1) 
     order by name;
Iqra Ijaz
  • 261
  • 1
  • 3
  • Iqra, thanks for the response. But this query will only work if the if the Rating and Comments are prefixed with the same string e.g. acting_command and acting_rating. However our real data is not like this. There is no set prefix. And suggestions on how to handle this case? – Saqib Ali Oct 13 '20 at 17:06
  • Iqra, also this won't work for cases where the comment field is NULL. Which is very likely to happen, since lot of customer only rate, but do not enter in the comments. – Saqib Ali Oct 13 '20 at 17:44
  • Can you give an example data for both cases? – Iqra Ijaz Oct 14 '20 at 07:26
  • hello Iqra, I added an example of the data with NULLS. You will see that the output doesn't contain the unpivoted rows that have NULL values in them. – Saqib Ali Oct 14 '20 at 21:22
2

If the goal is to store the unpivoted result as a table then INSERT ALL could be used to unpivot mutliple columns at once:

Setup:

create or replace table reviews(
     name varchar(50), acting_rating int,
     acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
select * from reviews;

Query:

CREATE OR REPLACE TABLE reviews_transposed(
    name VARCHAR(50)
    ,skill TEXT
    ,skill_rating INT
    ,skill_comments TEXT
);

INSERT ALL 
    INTO reviews_transposed(name, skill, skill_rating, skill_comments)
         VALUES (name, 'ACTING_RATING', acting_rating, acting_comments)
    INTO reviews_transposed(name, skill, skill_rating, skill_comments)
         VALUES (name, 'COMEDY_RATING', comedy_rating, comedy_comments)
SELECT *
FROM reviews;

SELECT *
FROM reviews_transposed;

Before:

enter image description here

After:

enter image description here


This approach has one significant advantage over UNION ALL approach proposed by Felippe, when saving into table (the number of table scans and thus partition read is growing for each UNION ALL wheareas INSERT ALL scans source table only once.

INSERT INTO reviews_transposed
select NAME
  , 'ACTING_RATING' as SKILL, ACTING_RATING as SKILL_RATING, ACTING_COMMENTS as SKILL_COMMENTS
from reviews
union all
select NAME
  , 'COMEDY_RATING', COMEDY_RATING, COMEDY_COMMENTS
from reviews;

enter image description here

vs INSERT ALL

enter image description here

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
1

Back in TSQL days i'd just use a CROSS APPLY. The nearest equivalent in snowflake would be something like:

create or replace TEMPORARY table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
SELECT  R.NAME
        ,P.VALUE:SKILL::VARCHAR(100)        AS SKILL
        ,P.VALUE:RATING::NUMBER             AS RATING
        ,P.VALUE:COMMENTS::VARCHAR(1000)    AS COMMENTS
FROM reviews R
,TABLE(FLATTEN(INPUT     => ARRAY_CONSTRUCT(
                                                OBJECT_CONSTRUCT('SKILL','COMEDY','RATING',R.COMEDY_RATING,'COMMENTS',R.COMEDY_COMMENTS),
                                                OBJECT_CONSTRUCT('SKILL','ACTING','RATING',R.ACTING_RATING,'COMMENTS',R.ACTING_COMMENTS)
                                            )
              )) AS P;

This only hits the source table once and preserves NULLs.

ResultSet

0

I've had same problem, Here is my solution for unpivoting by two categories AND keeping nulls:

First you replace NULL's with some string, for example: 'NULL'

Then brake the two unpivots into two separate cte's and create common category column to join them again later, 'skill' in your case.

Lastly, join the two cte's by name and skill category, replace the 'NULL' string with actual NULL

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, 'something', 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, 'hahaha');

  WITH base AS (SELECT name
                     , acting_rating
                     , IFNULL(acting_comments, 'NULL') AS acting_comments
                     , comedy_rating
                     , IFNULL(comedy_comments, 'NULL') AS comedy_comments
                  FROM reviews
               )
     , skill_rating AS (SELECT name
                             , REPLACE(skill, '_RATING', '') AS skill
                             , skill_rating
                          FROM base
                              UNPIVOT (skill_rating FOR skill IN (acting_rating, comedy_rating))
                       )
     , comments AS (SELECT name
                         , REPLACE(skill_comments, '_COMMENTS', '') AS skill
                         , comments
                      FROM base
                          UNPIVOT (comments FOR skill_comments IN (acting_comments,comedy_comments))
                   )

SELECT s.name
     , s.skill
     , s.skill_rating
     , NULLIF(c.comments, 'NULL') AS comments
  FROM skill_rating AS s
  JOIN comments AS c
       ON s.name = c.name
           AND s.skill = c.skill
 ORDER BY name;

The result:

name    skill   skill_rating    comments
abc ACTING  4   <null>
abc COMEDY  1   NO
lmn ACTING  1   what
lmn COMEDY  4   <null>
xyz ACTING  3   some
xyz COMEDY  1   haha