1

This is for a case study.

I have the following table, T1 with two columns: c1, c2

    C1     C2
    1       3
    5       2
    4       10

I want to create a table T2 with both C1, C2, and a new column C3 defined in a way that

 if C1 > 3  then C3 = "C1_big"
 if C2 > 3  then C3 = "C2_big"
 else C3 = "not_big"

so using our table T1, we get

    C1     C2     C3
    1       3     "not_big"
    5       2     "C1_big"
    4       10    "C1_big"
    4       10    "C2_big"

notice that the last row satisfies conditions for both "C1_big", and "C2_big", so we have two rows for that.

I was thinking of using CASE WHEN, but then it only applies to mutually exclusive events. For our case, "C1_big", "C2_big" can both happen for a row, so it doesn't need to be mutually exclusive.

Is there a way to achieve this in SQL query? Any help would be greatly appreciated.

user98235
  • 830
  • 1
  • 13
  • 31

1 Answers1

1

In Presto/Trino this can be achieved using array functions and unnest (using succinct syntax here):

-- sample data
with dataset (c1, c2) as (
    values (1, 3),
        (5, 2),
        (4, 10)
)

-- query
select *
from dataset,
     unnest(
        if(
             c1 <= 3 and c2 <=3, array['not_big'],
             filter(array[if(c1 > 3, 'C1_big'), if(c2 > 3, 'C2_big')], x -> x is not null)
        )) as t(c3);

Without using Presto-specific functions - you can use union all:

-- query
select *, 'not_big' c3
from dataset
where c1 <= 3
  and c2 <= 3

union all

select *, 'C1_big' c3
from dataset
where c1 > 3

union all

select *, 'C2_big' c3
from dataset
where c2 > 3;
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • As far as time complexity goes, does the first one take less time and memory than 2nd one? – user98235 Mar 09 '23 at 05:10
  • 1
    @user98235 in terms of time complexity they should be equal, in terms of the execution speed - depends on the SQL engine - if it can somehow find query similarity and optimize it then they should be similar, otherwise it will need to process the same data 3 times, hence it will be slower ~ 3 times (`explain analyze` confirms it in Trino) – Guru Stron Mar 09 '23 at 05:17
  • thanks. (i thought of using presto, so this should work) one last question - why is "x -> x is not null" needed here? Not sure what this is for. – user98235 Mar 09 '23 at 05:19
  • 1
    @user98235 check the linked docs and find `filter` function there - `x -> x is not null` is the second parameter: function accepting array element and returning bool. – Guru Stron Mar 09 '23 at 05:21
  • oh, got it. thanks! I will read the linked doc. – user98235 Mar 09 '23 at 05:21
  • Sorry to bother you again, but do you mind taking a look at https://stackoverflow.com/questions/76169414/getting-the-max-date-per-each-key-per-user ? Thanks. – user98235 May 04 '23 at 03:30
  • @user98235 - NP. Have tried to answer the question. – Guru Stron May 04 '23 at 10:32