-1

Suppose I have these data read into SAS:

Raw Data

I would like to list each unique name and the number of months it appeared in the data above to give a data set like this:

Frequency Data

I have looked into PROC FREQ, but I think I need to do this in a DATA step, because I would like to be able to create other variables within the new data set and otherwise be able to manipulate the new data.

  • Your last sentence makes it impossible to give a good answer to this question. Without knowing what further stats you want it's impossible to say what the best method is. You could use PROC MEANS/SQL/FREQ/TABULATE/SUMMARY/REPORT and of course, a data step to count numbers. – Reeza Jun 29 '17 at 22:17
  • Oh...I forgot hash tables, IML, DS2...and probably some other methods. – Reeza Jun 29 '17 at 22:17

4 Answers4

1

Data step:

proc sort data=have;
  by name month;
  run;

  data want;
     set have;
     by name month;
     m=month(lag(month));
     if first.id then months=1;
     else if month(date)^=m then months+1;
     if last.id then output;
     keep name months;
run;

Pro Sql:

proc sql;
   select distinct name,count(distinct(month(month))) as months from have group by name;
quit;
Shenglin Chen
  • 4,504
  • 11
  • 11
0

While it's possible to do this in a data step, you wouldn't; you'd use proc freq or similar. Almost every PROC can give you an output dataset (rather than just print to the screen).

PROC FREQ data=sashelp.class;
  tables age/out=age_counts noprint;
run;

Then you can use this output dataset (age_counts) as a SET input to another data step to perform your further calculations.

Joe
  • 62,789
  • 6
  • 49
  • 67
0

You can also use proc sql to group the variable and count how many are in that group. It might be faster than proc freq depending on how large your data is.

proc sql noprint;
    create table counts as
    select AGE, count(*) as AGE_CT from sashelp.class 
    group by AGE;
quit;
kstats9pt3
  • 799
  • 2
  • 8
  • 28
0

If you want to do it in a data step, you can use a Hash Object to hold the counted values:

data have;
do i=1 to 100;
    do V = 'a', 'b', 'c';
        output;
    end;
end;
run;

data _null_;
set have end=last;
if _n_ = 1 then do;
    declare hash cnt();
    rc = cnt.definekey('v');
    rc = cnt.definedata('v','v_cnt');
    rc = cnt.definedone();
    call missing(v_cnt);
end;

rc = cnt.find();
if rc then do;
    v_cnt = 1;
    cnt.add();
end;
else do;
    v_cnt = v_cnt + 1;
    cnt.replace();
end;

if last then
    rc = cnt.output(dataset: "want");
run;

This is very efficient as it is a single loop over the data. The WANT data set contains the key and count values.

DomPazz
  • 12,415
  • 17
  • 23