case when in sqldf / R

Question

If I have DF

I try to use sqldf as follows

 Test <- sqldf("                SELECT A,
                                    case when A = '1' and B = '1' then Col else NULL end as Test_1, 
                                    case when A = '1' and B = '2' then Col else NULL end as Test_2, 
                                    case when A = '1' and B = '3' then Col else NULL end as Test_3, 
                                    case when A = '2' and B = '1' then Col else NULL end as Test_4, 
                                     case when A = '2' and B = '2' then Col else NULL end as Test_5, 
                                      case when A = '2' and B = '3' then Col else NULL end as Test_6 

                                   FROM  DF
                                   group by A;")

However, it only returns result for each case of A and the other columns NULL so instead of my desired

A Test1 Test2 Test3 Test4 Test5 Test6
1   A     C     E    Null  NULL  NULL
2  NULL  NULL  NULL   D      B    F

I only get one result for each A

A Test1 Test2 Test3 Test4 Test5 Test6
1   G    NULL  NULL  Null  NULL  NULL
2  NULL  NULL  NULL  NULL   B    NULL

What am I doing wrong and is there a way to get the format that I require?

You are grouping by *A* which carry only two values. Try removing `GROUP BY` as you do no aggregation anyway. — Parfait, Aug 24 '17 at 14:41
Thanks. True but that still doesn't solve the problem for me — Rtab, Aug 24 '17 at 14:55
Your desired output makes no sense to me - if I remove the `group by` I get a result that makes sense (with 8 rows). The `Test_1` column is 'A' in the first row and 'G' in the 7th row, both rows of the input have A is 1 and B is 1. You somehow want to aggregate this up to just 'G' - so you need to use an *aggregate function* in the `select` with the `group by` - and you need that function to pick 'G' is chosen, not 'A' in the `Test_1` column. Similarly why is 'B' chosen, not 'H', in the `Test_5` column? Both rows 2 and 8 match the Test_5 case statement. — Gregor Thomas, Aug 24 '17 at 15:20
Apologies. I made a typo in my submission. Have edited to correct now. I want the results of my select statement to be one row for when A = 1 and one row for when A = 2. Then I want to select A and the various corresponding values for A from Col as additional columns in the same statement. Does that make more sense to you? — Rtab, Aug 24 '17 at 15:31
Similarly, you'll want aggregate functions for the other columns that ignore the NULLs and pick the values that are there. You can use `max()` or `min()` to get close to your desired output. `max()` will pick G over A and H over B, min will do the opposite. — Gregor Thomas, Aug 24 '17 at 15:31
Just use `max()` or `min()` around you case statements. [Read up on aggregate functions](http://www.sqlcourse2.com/agg_functions.html), whenever you are using a `group by`, every column in the `select` should either by part of the grouping or the result of an aggregate. — Gregor Thomas, Aug 24 '17 at 15:33

score 2 · Accepted Answer · answered Aug 24 '17 at 18:10

Known as conditional aggregation in SQL (often used for pivoting data), as @Gregor comments simply run an aggregate like MAX() (even MIN() will work) around the CASE statements:

SELECT A,
       MAX(CASE WHEN A = '1' AND B = '1' THEN Col ELSE NULL END) as Test_1, 
       MAX(CASE WHEN A = '1' AND B = '2' THEN Col ELSE NULL END) as Test_2, 
       MAX(CASE WHEN A = '1' AND B = '3' THEN Col ELSE NULL END) as Test_3, 
       MAX(CASE WHEN A = '2' AND B = '1' THEN Col ELSE NULL END) as Test_4, 
       MAX(CASE WHEN A = '2' AND B = '2' THEN Col ELSE NULL END) as Test_5, 
       MAX(CASE WHEN A = '2' AND B = '3' THEN Col ELSE NULL END) as Test_6
FROM DF
GROUP BY A

case when in sqldf / R

1 Answers1