0

I have many scenarios in which the following query will give me what I need -

proc sql;
   create table test as
   select ID,count(task) as count
   from table 
   group by ID;
quit;
  • The "tasks" that are being counted are SSN's which are character values.

I tried to replicate this using proc means to expand my methods -

proc means data = table noprint;
   class ID;
   var task;
   output out=test sum=tot;
run;

I get an error that reads -

ERROR: Variable TASK in list does not match type prescribed for this list.

I am assuming this is because I am telling it to "sum" a character variable, when really what I want to do is "count" the observations by ID. The word "sum" may not be the key word to use here, but I don't know what other keyword would give me a "count" by ID. Is this a simple syntax error in the proc means step, or is this the wrong approach?

andrey_sz
  • 751
  • 1
  • 13
  • 29
SMW
  • 470
  • 1
  • 4
  • 19
  • What is it you think the SQL code is doing? You are asking to count how many non-missing values of `TASK` are within each value of `ID`. If the same value of `TASK` appears 5 times it will add 5 to the total. If the values of `TASK` are unique within `ID` then why not just tell SQL to `count(*)`. And in `PROC MEANS` leave off the `VAR` statement and just use the `_FREQ_` variable in the output dataset. – Tom Dec 18 '15 at 16:50
  • Although you can use Proc Means as demonstrated, the appropriate proc for counting is usually PROC FREQ. – Reeza Dec 18 '15 at 17:56
  • @ Tom - Good point, I should have mentioned that the data set being summarized is cleaned (distinct observations by ID and Task), so count(*) would give me the same results. I have another question that may/may not be suitable for a different post. I wanted to test the proc method vs the sql method to see which one is more efficient, as this process is done on data sets are much larger. What would the benefits be from one to the other? – SMW Dec 18 '15 at 21:03
  • I should clarify more...the proc sql method logs - "NOTE: PROCEDURE SQL used (Total process time): real time 0.22 seconds cpu time 0.10 seconds ". The Proc Means logs - NOTE: PROCEDURE MEANS used (Total process time): real time 0.05 seconds cpu time 0.06 seconds ...will the proc method always be faster like the example here? – SMW Dec 18 '15 at 21:13

1 Answers1

1

_FREQ_

proc means data = table noprint;
class ID;
output out=test;
run;
data _null_
  • 8,534
  • 12
  • 14