0

I have a dataset of this kind, this dataset contains some missing values (I'm representing them with X).

ID  VA  VB  ... Vn      |       ID  VA  VB  ... Vn  |   
1   a1  b1  ... n1      |       1   a1  X   ... n1  |
2   a2  b2  ... n2      |===    2   X   b2  ... X   |
3   a3  b3  ... n3      |===    3   a3  b3  ... n3  |
..................      |       ..................  |
N   aN  bN  ... nN      |       N   X   bN  ... nN  |

I want to add observations by ID using only one variable column, I will call that variable VAR: something like an inverted proc format by ID; var VAR;.

ID  VAR
1   a1
..
1   n1 
2   b2
..
3   a3
3   b3
..
3   n3
..
N   bN
..  
N   nN

so I tried to split OLDdataset in different datasets (NEWa NEWb ...NEWn) where, in each dataset we have all the not-missing observations stored in a column called VAR. Then I will merge NEWa NEWb ... NEWn in NEWdataset and I will apply a proc sort for restoring the order by ID.

The problem arised when I realized that "n" is not known prior the analysis 'cause I want to setup a generalized code that won't work for only one dataset, and VA VB VN are the result of a proc format step.

If n is knows I would use something like this:

data NEWa NEWb NEWc;
set OLDdataset;
array try[3] VA VB VC ;
if try[1] ne '.' then output NEWa;
if try[1] ne '.' then output NEWb;
if try[1] ne '.' then output NEWc;
run;

But now i need some iterative naming system that maybe could sounds like:

data NEW_i;                     <-------- "i" must be assigned
set OLDdataset;
array try[*] V: ;
do i=1 dim(try);
if try[i] ne '.' then output NEW_i; <---- "i" must be assigned
run;

Hope this sounds clear. Any hints? Thanks.

stat
  • 699
  • 3
  • 10
  • A clarification: are the categories in each var are always distinct from each other? IOW: Can you have the same value for two different variables? – Leo Sep 04 '15 at 04:49

2 Answers2

0

If I've understood the scenario correctly you are dealing with numeric variables, so on that basis you might not necessarily have to split the data by variable into different data sets, as you should be able to use the nmiss() function which returns the number of missing numeric values, e.g. in a subsetting if:

data newdataset;
  set olddataset;
  if nmiss(&varlist)=0;
run;

where &varlist could be created just before using olddataset in the above data step (as you said the number of variables can be dynamic) using:

/* note literals are case sensitive */
proc sql noprint;
  select name into :varlist separated by ','
  from dictionary.columns
  where libname='WORK'
    and memname='OLDDATASET'
    /* use following if you also have non-numeric variables */
    /* and type='num' */
  ;
quit;

You can use the following to see the contents of &varlist:

%put varlist=&varlist;

or even use the following option before the macro variable is referenced:

options symbolgen;
Amir
  • 880
  • 1
  • 6
  • 15
  • dear @Amir special thanks for the try! anyway I'm pretty sure that this won't work because of this: if nmiss(&varlist)=0; all my column will return 0, because they all have at least a missing value :) also, I'm working with character variables (categorical multilevel), and I'm so sorry for not having specified that in the text. – stat Nov 28 '14 at 09:12
0

Try a proc transpose instead, you can delete the missing in a second step or add a where clause to the out data set. If you want to create subsets after that based on ID separate them out in a data step.

*generate sample data;
data have;
array V(20) v1-v20;
do ID=1 to 10;
    do i=1 to 20;
    v(i)=rand('normal', 0, 1);
    if v(i) < -2 then v(i)=.;
    end;
    output;
end;
drop i;
run;

*Flip data;
proc transpose data=have out=want(where=(var1 ne .)) prefix=Var;
    by id;
run;

*Separate into different data sets;
data _null_;
    set want;
    by id;
    call execute(catt("data var", id, "; set want; where id=", id, ";run;"));
run;
Reeza
  • 20,510
  • 4
  • 21
  • 38
  • dear @Reese this is a great try, thanks for your help. I'm not hiding that I'm pretty new with SAS environment and never heard about call execute() before. :) – stat Nov 28 '14 at 09:16