-1

I have a table with very many columns but for the in order to explain my problem I will use this simple table.

data test;
    input a b c;
    datalines;
    0 0 0
    1 1 1   
    . 4 2
    ;        
run;

I need to calculate the common summary statistic as min, max and number of missing. But I also need to calculate some special numbers as number of values above a certain level( in this example >0 and >1.

I can use proc mean but it only give me results for normal things like min, max etc.

What I want is result on the following format:

var minval maxval nmiss n_above1 n_above2
a    0     1   1     1        0
b    0     4   0     2        1
c    0     2   0     2        1   

I have been able to make this informat for one variable with this rather stupid code:

data result;
    set test(keep =b) end=last;
    variable = 'b';
    retain minval maxval;
    if _n_ = 1 then do;
        minval = 1e50;
        maxval = -1e50;
    end;
    if minval > b then minval = b;
    if maxval < b then maxval = b;
    if b=. then nmiss+1;
    if b>0 then n_above1+1;
    if b>2 then n_above2+1;
    if last then do;        
        output;
    end;
    drop b;
run;

This produce the following table:

variable minval maxval nmiss n_above1 n_above2
b        0      4      0     2        1

I know there has to be better way do this. I am used to Python and Pandas. There I will only loop through each variable, calculate the different summary statistick and append the result to a new dataframe for each variable.

I can probably also use proc sql. The next example

proc sql;
    create table res as
    select count(case when a > 0 then 1 end) as n_above1_a,
           count(case when b > 0 then 1 end) as n_above1_b,
           count(case when c > 0 then 1 end) as n_above1_c
    from test;
quit;

This gives me:

n_above1_a n_above1_b n_above1_c
1          2          2

But this do not solve my problem.

fossekall
  • 521
  • 1
  • 10
  • 27
  • Are you familiar with SQL? SAS has a PROC SQL which would be one way to approach this. If you want to stick with PROC MEANS, you could make binary variables `GreaterThan0` and `GreaterThan2` and use PROC MEANS to sum them. – Quentin Aug 04 '17 at 23:59
  • I have used proc sql but I cannot see how that can solve my problem. I have only used it to make sum,max and min. But I cannot see how I will count columns which is larger than x. – fossekall Aug 05 '17 at 05:56
  • Can't test now. Couldn't you do `select min(a), max(a), sum(missing(a)), sum(a>0)` ? – Quentin Aug 05 '17 at 12:11
  • Do not know. I will try tomorrow. – fossekall Aug 05 '17 at 12:28

1 Answers1

2

If you add an unique identifier to each row then you can just use PROC TRANSPOSE and PROC SQL to get your result.

data test;
  input a b c;
  id+1;
datalines;
0 0 0
1 1 1   
. 4 2
;        
proc transpose data=test out=tall ;
 by id ;
run;
proc sql noprint ;
  create table want as 
    select _name_
         , min(col1) as minval
         , max(col1) as maxval
         , sum(missing(col1)) as nmiss
         , sum(col1>1) as n_above1
         , sum(col1>2) as n_above2
    from tall
    group by _name_
  ;
quit;

Result

Obs    _NAME_    minval    maxval    nmiss    n_above1    n_above2

 1       a          0         1        1          0           0
 2       b          0         4        0          1           1
 3       c          0         2        0          1           0
Tom
  • 47,574
  • 2
  • 16
  • 29