0

I am trying to write a simple code to count number of services provided each year for a variety of providers, based on a single original column. Ideally my output would simplistically look something like:

Input data would look something like this:

Prov_ID    Name    Service_Cd  Date
A          Joe     B2           02JUN2012
A          Joe     C9           04OCT2013
A          Joe     B2           12JUL2014
B          Steve   A1           12MAR2012
B          Steve   E4           20OCT2013
C          Tom     B10          23SEP2012
...        ...     ...          ...
...        ...     ...          ...

AND So on, with the goal being to be have a unique provider id on, a total services performed, and then totals per year 2012, 2013, 2014.

ProvID  Name  Service_Count  2012_Count  2013_Count  2014_Count
A       Joe    12             4            6            2
B       Steve  15             5            5            5 
C       Tom    22             10           8            4

A dulled down current version of the code I have written for this particular task includes:

proc sql;
CREATE TABLE provider_detail as
    SELECT distinct(PROV_ID), COUNT(distinct Service_CD)AS Service_Count, COUNT (date between '01JAN2012'd AND '31DEC2012'd)AS 2012_Count,COUNT (date between '01JAN2013'd AND '31DEC2013'd)AS 2013_Count,COUNT (date between '01JAN2014'd AND '31DEC2014'd)AS 2014_Count
    FROM primary1
    Group BY PROV_ID;
    run;

But doing so I get the same count in each column. Any help would be greatly appreciated as I am new to SAS and still learning the ropes so to say. Thank you!

Brad
  • 85
  • 12
  • Replace `count()` with `sum()` (except the first one) and it will work (also let the output var names start with a letter or an underscore and replace `run;` with `quit;`) – DaBigNikoladze May 07 '15 at 08:27
  • I think some of the confusion here is because no one knows what your input data actually looks like. It might help to post a sample of the input data set. – DWal May 07 '15 at 13:09
  • Thank you, @DaBigNikoladze changing the changing the count to sum worked. – Brad May 07 '15 at 14:30
  • @DWal, I agree I should have done that originally, I will add sample input code for others to potentially use this. – Brad May 07 '15 at 14:31

1 Answers1

1

Your date between bit is wrong, primarily. This is something SAS would generally be very happy to do for you in a PROC, though the SQL is not particularly complicated either.

In SQL, one option is to use case when to get this sort of information. See something like the following:

proc sql;
    select make, count(distinct model) as total, 
        count(distinct case when driveTrain='Rear' then Model else ' ' end) as RWD,
        count(distinct case when driveTrain='Front' then model else ' ' end) as FWD,
        count(distinct case when driveTrain='All' then model else ' ' end) as AWD        
    from sashelp.cars
    group by make;
quit;

case when is basically if in sql: at a row level, it does something conditionally, then puts that in the select result.

Of course, in SAS this is easier to do with a tabulation procedure (proc freq, proc means, proc tabulate):

proc tabulate data=sashelp.cars;
    class make drivetrain;
    tables make,drivetrain*n;
run;

As noted below this requires the unique-ness already to be taken care of separately.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • The `date between` bit is not wrong, it only has to be inside the function `sum()` instead of `count()`. In this way it will sum the 0/1 logic condition counting the number of services between the desidered dates. This is more flexible than preparing `group by` variables, for example you can count inside overlapping date intervals. – DaBigNikoladze May 07 '15 at 08:26
  • If make/model combos are not unique, then that proc tabulate won't produce the desired results. If make/model are unique then the `distinct` clauses are unnecessary and you could just use `count(1) as total, sum(driveTrain='Rear') as RWD,...` as BDN suggests. This would match the proc tabulate above. – DWal May 07 '15 at 13:40
  • @DaBigNikoladze Yes, that's another option - I like this way better, I find `case when` more flexible for my needs. I didn't mean to say `date between` was not possible to make work, just that it was where the problem is (as you note, `sum` would work). – Joe May 07 '15 at 14:20
  • @DWal Absolutely, and I meant to leave that as a note - got distracted with work and forgot before I posted. Thanks for the reminder. – Joe May 07 '15 at 14:21