PROC SQL - Transposing Data into Columns when Condition Met

Question

My data is structured as below, where each unique ID will have a row displaying the balance on the last day of the month :

ID      Day_Key    Balance
23412   20171229   50000
23412   20180131   45000
23412   20180228   40000   
27435   20171229   100000    
27435   20180131   80000
27435   20180228   60000

I want to create a table where each unique ID is displayed on one row, with columns indicating the balance at each month, like so :

ID     DEC17    JAN 18    FEB18
23412  50000    45000     40000
27435  100000   80000     60000

**UPDATE*

My current code is shown below

PROC SQL;
CREATE TABLE BAL_TRANSPOSE AS 
SELECT DISTINCT ID,
        MAX(SUB_EY17) AS EY17,
      MAX(SUB_JAN18) AS JAN18,
      MAX(SUB_FEB18) AS FEB18,
      MAX(SUB_MAR18) AS MAR18,
      MAX(SUB_APR18) AS APR18,
      MAX(SUB_MAY18) AS MAY18,
      MAX(SUB_JUN18) AS JUN18,
      MAX(SUB_JUL18) AS JUL18,
      MAX(SUB_AUG18) AS AUG18,
      MAX(SUB_SEP18) AS SEP18,
      MAX(SUB_OCT18) AS OCT18,
      MAX(SUB_NOV18) AS NOV18,
      MAX(SUB_EY18) AS EY18
FROM (SELECT DISTINCT ID,
    CASE WHEN DAY_KEY = 20171229 THEN OUTSTANDING_BALANCE END AS SUB_EY17,
    CASE WHEN DAY_KEY = 20180131 THEN OUTSTANDING_BALANCE END AS SUB_JAN18,
    CASE WHEN DAY_KEY = 20180228 THEN OUTSTANDING_BALANCE END AS SUB_FEB18,
    CASE WHEN DAY_KEY = 20180330 THEN OUTSTANDING_BALANCE END AS SUB_MAR18,
    CASE WHEN DAY_KEY = 20180430 THEN OUTSTANDING_BALANCE END AS SUB_APR18,
    CASE WHEN DAY_KEY = 20180531 THEN OUTSTANDING_BALANCE END AS SUB_MAY18,
    CASE WHEN DAY_KEY = 20180629 THEN OUTSTANDING_BALANCE END AS SUB_JUN18,
    CASE WHEN DAY_KEY = 20180731 THEN OUTSTANDING_BALANCE END AS SUB_JUL18,
    CASE WHEN DAY_KEY = 20180831 THEN OUTSTANDING_BALANCE END AS SUB_AUG18,
    CASE WHEN DAY_KEY = 20180928 THEN OUTSTANDING_BALANCE END AS SUB_SEP18,
    CASE WHEN DAY_KEY = 20181031 THEN OUTSTANDING_BALANCE END AS SUB_OCT18,
    CASE WHEN DAY_KEY = 20181130 THEN OUTSTANDING_BALANCE END AS SUB_NOV18,
    CASE WHEN DAY_KEY = 20181231 THEN OUTSTANDING_BALANCE END AS SUB_EY18
FROM TABLE1) AS SUB
GROUP BY ID;   
QUIT;

The new columns are created, however only null values appear. Below is the results I am seeing (trimmed for readability). The query returns over 1m records but from what I can see, all have 0 values. I have tested the data and know that every ID should have a value for each day_key.

ID      EY17    JAN18        FEB18       MAR18         APR18   
1111    -       -            -            -            -
2222    -       -            -            -            -
3333    -       -            -            -            -
4444    -       -            -            -            -
5555    -       -            -            -            -

This is a pretty straightforward PROC TRANSPOSE https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/ — Reeza, Aug 22 '19 at 15:02
Sorry Reeza, I am not great with SAS hence trying to code around this in PROC SQL. Have had a few attempts at your link but I cant get the hang of it yet... — ScottCee, Aug 23 '19 at 15:01
Richard - by 'condition' I meant for every new instance of day_key, create a column that stores the balance for that day. — ScottCee, Aug 23 '19 at 15:56

score 2 · Answer 1 · edited Aug 22 '19 at 15:16

2

Proc tranpose is best for this scenario. You were also close with SQL. All you need was small change by adding a aggregate function.

 PROC SQL;
  CREATE TABLE BAL_TRANSPOSE AS 
  SELECT ID,
       max(CASE WHEN DAY_KEY = 20171229 THEN BALANCE END) AS DEC17,
       max(CASE WHEN DAY_KEY = 20180131 THEN BALANCE END) AS JAN18,
        max(CASE WHEN DAY_KEY = 20180228 THEN BALANCE END) AS FEB18,
  FROM TABLE1
  GROUP BY ID    
 QUIT;

edited Aug 22 '19 at 15:16

Llex

1,770
1
12
27

answered Aug 22 '19 at 15:13

Kiran

3,255
3
14
21

Thanks Alexey, I have tried this however now I am not receiving any values pulled through? – ScottCee Aug 23 '19 at 14:43

score 2 · Answer 2 · answered Aug 22 '19 at 15:14

2

Original SQL would work with aggregated function as the process is known as conditional aggregation, a common form of pivoting data from long to wide when columns are known and a handful in number.

PROC SQL;
   CREATE TABLE BAL_TRANSPOSE AS 
   SELECT ID,
          MAX(CASE WHEN DAY_KEY = 20171229 THEN BALANCE END) AS DEC17,
          MAX(CASE WHEN DAY_KEY = 20180131 THEN BALANCE END) AS JAN18,
          MAX(CASE WHEN DAY_KEY = 20180228 THEN BALANCE END) AS FEB18
   FROM TABLE1
   GROUP BY ID    
QUIT;

However, with SAS proc sql you may need to use a subquery:

PROC SQL;
   CREATE TABLE BAL_TRANSPOSE AS 
   SELECT ID, 
          MAX(SUB_DEC17) AS DEC17,
          MAX(SUB_JAN18) AS JAN18,
          MAX(SUB_FEB18) AS FEB18
   FROM (SELECT ID,
                CASE WHEN DAY_KEY = 20171229 THEN BALANCE END AS SUB_DEC17,
                CASE WHEN DAY_KEY = 20180131 THEN BALANCE END AS SUB_JAN18,
                CASE WHEN DAY_KEY = 20180228 THEN BALANCE END AS SUB_FEB18
         FROM TABLE1) AS sub
   GROUP BY ID    
QUIT;

Actually your original query should have erred out since you included non-aggregated columns in SELECT that did not appear in GROUP BY -a violation in ANSI-SQL standards. SAS likely converted your attempted aggregate query to unit level (i.e., ignored GROUP BY) as possibly shown with log notes or warnings.

answered Aug 22 '19 at 15:14

Parfait

104,375
17
94
125

1

SAS will allow you to include extra variables in the select list and automatically remerge the summary statistics back to the detailed records. It will make a note in the log that it has done this for you. – Tom Aug 22 '19 at 16:21
@Tom, exactly! That is a SAS workaround and since OP's query developed no summary stats, output remained at unit level. But all major RDBMS's that comply will fail with OP's attempted query. This is important to note for future readers who attempt to use same SAS queries to backend external database connections. – Parfait Aug 22 '19 at 16:51
@Parfait, thank you for your help. I have updated my original post with my new code, however I am still having issues pulling through the values? – ScottCee Aug 23 '19 at 14:34
1

Is *DAY_KEY* a string or number type? If former, add quotes in `CASE` statements. – Parfait Aug 23 '19 at 15:27
DAY_KEY is numeric, I have added my results from your suggestion to the OP. Appreciate all your help with this. – ScottCee Aug 23 '19 at 15:55
What is *OUTSTANDING_BALANCE*? Is it different from *Balance* in table at very top? – Parfait Aug 23 '19 at 20:23

score 2 · Answer 3 · answered Aug 22 '19 at 16:15

Transposing a time dimension into an column identifier can often mean a report is desired instead of a data transformation.

Consider using TABULATE or REPORT

data have;
attrib 
  id length=8
  day_key length=4 informat=yymmdd8. format=yymmdd10.
  balance length=8 format=comma12.
;
input
ID      Day_Key    Balance; datalines;
23412   20171229   50000
23412   20180131   45000
23412   20180228   40000   
27435   20171229   100000    
27435   20180131   80000
27435   20180228   60000
run;

ods html;

proc tabulate data=have;
  class id day_key;
  var balance;
  format day_key monyy7.;
  table 
    id = ''
    ,
    day_key='' * balance='' * max='' * f=comma12.
    /
    box = 'id'
  ;
run;

** -- OR --;

proc report data=have;
  columns id (balance, day_key);
  define id / group;
  define day_key / ' ' across format=monyy7.;
  define balance / ' ' analysis max;
run;

Nice! I have saved this solution to my personal library! – Parfait Aug 22 '19 at 16:54 — Parfait, Aug 22 '19 at 16:54

score 1 · Answer 4 · edited Aug 22 '19 at 15:19

you can use proc transpose:

/*prepare*/
data g;
input ID  Day_Key   Balance;
datalines4;
23412   20171229   50000
23412   20180131   45000
23412   20180228   40000   
27435   20171229   100000    
27435   20180131   80000
27435   20180228   60000
;;;;
run;

proc sort ;
by id;
run;

/*you need*/
proc transpose data=g out=g2;
id Day_Key;
by id;
run;

You will get:

+-------+----------+----------+----------+
|  ID   | 20171229 | 20180131 | 20180228 |
+-------+----------+----------+----------+
| 23412 |    50000 |    45000 |    40000 |
| 27435 |   100000 |    80000 |    60000 |
+-------+----------+----------+----------+

So, you can format your dates , that give you names "JAN18" and e.g.

In addition, you could use IDLABEL.

Maybe worth creating a new variable to store the variable names and labels cleanly using IDLABEL? — Reeza, Aug 22 '19 at 15:02

PROC SQL - Transposing Data into Columns when Condition Met

4 Answers4