2

If I have DF

   A B  Col
    1 1  A
    2 2  B
    1 2  C
    2 1  D
    1 3  E
    2 3  F

I try to use sqldf as follows

 Test <- sqldf("                SELECT A,
                                    case when A = '1' and B = '1' then Col else NULL end as Test_1, 
                                    case when A = '1' and B = '2' then Col else NULL end as Test_2, 
                                    case when A = '1' and B = '3' then Col else NULL end as Test_3, 
                                    case when A = '2' and B = '1' then Col else NULL end as Test_4, 
                                     case when A = '2' and B = '2' then Col else NULL end as Test_5, 
                                      case when A = '2' and B = '3' then Col else NULL end as Test_6 

                                   FROM  DF
                                   group by A;")

However, it only returns result for each case of A and the other columns NULL so instead of my desired

A Test1 Test2 Test3 Test4 Test5 Test6
1   A     C     E    Null  NULL  NULL
2  NULL  NULL  NULL   D      B    F

I only get one result for each A

A Test1 Test2 Test3 Test4 Test5 Test6
1   G    NULL  NULL  Null  NULL  NULL
2  NULL  NULL  NULL  NULL   B    NULL

What am I doing wrong and is there a way to get the format that I require?

Rtab
  • 123
  • 10
  • You are grouping by *A* which carry only two values. Try removing `GROUP BY` as you do no aggregation anyway. – Parfait Aug 24 '17 at 14:41
  • Thanks. True but that still doesn't solve the problem for me – Rtab Aug 24 '17 at 14:55
  • Your desired output makes no sense to me - if I remove the `group by` I get a result that makes sense (with 8 rows). The `Test_1` column is 'A' in the first row and 'G' in the 7th row, both rows of the input have A is 1 and B is 1. You somehow want to aggregate this up to just 'G' - so you need to use an *aggregate function* in the `select` with the `group by` - and you need that function to pick 'G' is chosen, not 'A' in the `Test_1` column. Similarly why is 'B' chosen, not 'H', in the `Test_5` column? Both rows 2 and 8 match the Test_5 case statement. – Gregor Thomas Aug 24 '17 at 15:20
  • Apologies. I made a typo in my submission. Have edited to correct now. I want the results of my select statement to be one row for when A = 1 and one row for when A = 2. Then I want to select A and the various corresponding values for A from Col as additional columns in the same statement. Does that make more sense to you? – Rtab Aug 24 '17 at 15:31
  • Similarly, you'll want aggregate functions for the other columns that ignore the NULLs and pick the values that are there. You can use `max()` or `min()` to get close to your desired output. `max()` will pick G over A and H over B, min will do the opposite. – Gregor Thomas Aug 24 '17 at 15:31
  • Just use `max()` or `min()` around you case statements. [Read up on aggregate functions](http://www.sqlcourse2.com/agg_functions.html), whenever you are using a `group by`, every column in the `select` should either by part of the grouping or the result of an aggregate. – Gregor Thomas Aug 24 '17 at 15:33
  • Will try. Thanks Gregor – Rtab Aug 24 '17 at 15:34

1 Answers1

2

Known as conditional aggregation in SQL (often used for pivoting data), as @Gregor comments simply run an aggregate like MAX() (even MIN() will work) around the CASE statements:

SELECT A,
       MAX(CASE WHEN A = '1' AND B = '1' THEN Col ELSE NULL END) as Test_1, 
       MAX(CASE WHEN A = '1' AND B = '2' THEN Col ELSE NULL END) as Test_2, 
       MAX(CASE WHEN A = '1' AND B = '3' THEN Col ELSE NULL END) as Test_3, 
       MAX(CASE WHEN A = '2' AND B = '1' THEN Col ELSE NULL END) as Test_4, 
       MAX(CASE WHEN A = '2' AND B = '2' THEN Col ELSE NULL END) as Test_5, 
       MAX(CASE WHEN A = '2' AND B = '3' THEN Col ELSE NULL END) as Test_6
FROM DF
GROUP BY A
Parfait
  • 104,375
  • 17
  • 94
  • 125