6

i would like to know if it is possible to transpose efficiently from wide to long using proc sql in sas.

I'm aware that proc transpose is much quicker that the method i suggest below. But one of my objective would be to avoid storing the transposed table.

Let's say for example, that i have table1 as

Id|   A|   B|   C|  D    
_____________________
 1|  100|3500|6900| 10300
 2|  200| 250| 300| 350
 3|  150|  32| 400| 204
 4|  200| 800|1400| 2000

and i want to turn it into

id|col1|  col2|
______________
 1|   A|   100|
 1|   B|  3500|
 1|   C|  6900|
 1|   D| 10300|
 2|   A|   200|
 2|   B|   250|
 2|   C|   300|
 2|   D|   350|
 3|   A|   150|
 3|   B|    32|
 3|   C|   400|
 3|   D|   204|
 4|   A|   200|
 4|   B|   800|
 4|   C|  1400|
 4|   D|  2000|

I could do this;

select id, 'A' as col1, A as col2
from table1
where A ~=""
union select id, 'B' as col1, B as col2
from table1
where B ~=""
etc

but it is highly inefficient.

Any idea? Thanks.

DJJ
  • 2,481
  • 2
  • 28
  • 53
  • The `UNION` will remove any duplicates, if you don't care if there are duplicates, then you could use `UNION ALL` which may help with performance. – Taryn Aug 19 '13 at 17:57
  • Looks very much like a SQL `unpivot`, but wiring that into `PROC SQL` might be difficult. – Philip Kelley Aug 19 '13 at 17:59

2 Answers2

10

If you're in SAS, use PROC TRANSPOSE for this option. There is no particularly good way to do this in PROC SQL; while many SQL variants have their own way to pivot data, SAS has PROC TRANSPOSE and expects you to use it.

The SAS datastep also does this very efficiently, perhaps even better than PROC TRANSPOSE. Here's an example, including creating a view as noted in the comments.

data want/view=want;
set have;
array vars a b c d;                  *array of your columns to transpose;
do _t = 1 to dim(vars);              *iterate over the array (dim(vars) gives # of elements);
  if not missing(vars[_t]) then do;  *if the current array element's value is nonmissing;
    col1=vname(vars[_t]);            *then store the variable name from that array element in a var;
    col2=vars[_t];                   *and store the value from that array element in another var;
    output;                          *and finally output that as a new row;
  end;
end;
drop a b c d _t;                     *Drop the old vars (cols) and the dummy variable _t;
run;
SnareChops
  • 13,175
  • 9
  • 69
  • 91
Joe
  • 62,789
  • 6
  • 49
  • 67
  • very interesting. Is there a way then to avoid storing the table want. As my table is a quite big and i have more steps ahead. – DJJ Aug 19 '13 at 18:24
  • Sure, you could create it as a view. – Joe Aug 19 '13 at 19:38
  • So Proc transpose or proc data is superior to proc sql as far as transpose is concerned.http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001278887.htm. Thanks a lot. – DJJ Aug 19 '13 at 20:38
  • This code is awesome. Could you explain it a little bit more? Is there a typo at line 5? **missing(vars[_t])** instead of missing(vars) – DJJ Aug 20 '13 at 08:51
  • I gave you a check for a good solution to the OP's question, but I have to challenge the notion that the data step is "better" than proc transpose. Even this very simple problem required far more code than needed by proc transpose, as you can see in the other answer posted. And I seriously doubt that your data step solution runs faster than proc transpose. – floydn Feb 02 '16 at 13:22
  • @floydn Certainly wouldn't suggest it is better in terms of code. In terms of run efficiency it varies: some data will be more efficient in one or the other. This can be done as a view which is a big advantage over PROC TRANSPOSE in cases where a view is helpful since it doesn't have to write the data set out anywhere on disk. – Joe Feb 02 '16 at 13:26
1

I actually did something just like this today. Try doing this,

proc transpose data = ORIGINAL_DATA;
        out = NEW_DATA;
    by id;
    VAR A-D;
run;

I think this should work.

Skyler
  • 160
  • 1
  • 2
  • 13