The functional equivalent of what you're trying to do is the following:
data _null_;
set issues1(rename=(issue_count=_issue_count
resolved_count=_resolved_count)) end=done;
if _n_=1 then do;
declare hash total_issues();
total_issues.defineKey("request", "area");
total_issues.defineData("request", "area", "issue_count", "resolved_count");
total_issues.defineDone();
end;
if total_issues.find() ne 0 then do;
issue_count = _issue_count;
resolved_count = _resolved_count;
end;
else do;
issue_count + _issue_count;
resolved_count + _resolved_count;
end;
total_issues.replace();
if done then total_issues.output(dataset: "issues2");
run;
This method does not require you to to pre-sort the dataset. I wanted to see what kind of performance you'd get with using different methods so I did a few tests on a 74M row dataset. I got the following run-times (your results may vary):
Unsorted Dataset:
Proc SQL
- 12.18 Seconds
Data
Step With Hash Object Method (above) - 26.68 Seconds
Proc Means
using a class
statement (nway) - 5.13 Seconds
Sorted Dataset (36.94 Seconds to do a proc sort):
Proc SQL
- 10.82 Seconds
Proc Means
using a by
statement - 9.31 Seconds
Proc Means
using a class
statement (nway) - 6.07 Seconds
Data
Step using by
statement (I used the code from Joe's answer) - 8.97 Seconds
As you can see, I wouldn't recommend using the data step with the hash object method shown above since it took twice as long as the proc sql.
I'm not sure why proc means
with a by
statement took longer then proc means
with a class
statement, but I tried this on a bunch of different datasets and saw similar differences in runtimes (I'm using SAS 9.3 on Linux 64).
Something to keep in mind is that these runtimes might be completely different for your situation but I would recommend using the the following code to do the summation:
proc means data=issues1 noprint nway;
class request area;
var issue_count resolved_count;
output out=issues2(drop=_:) sum=;
run;