0

New to SAS and would appreciate advice and help on how best to handle this data mangement situation.

I have a dataset in which each observation represents a client. Each client has a "description" variable which could include either a comprehensive assessment, treatment or discharge. I have created 3 new variables to flag each observation if they contain one of these.

So for example:

treat_yes = 1 if description contains "tx", "treatment" dc_yes = 1 if description contains "dc", "d/c" or "discharge" ca_yes = 1 if desciption contains "comprehensive assessment" or "ca" or "comprehensive ax"

My end goal is to have a new dataset of clients that have gone through a Comprehensive Assessment, Treatment and Discharge.

I'm a little stumped as to what my next move should be here. I have all my variables flagged for clients. But there could be duplicate observations just because a client could have come in many times. So for example:

Client_id    treatment_yes    ca_yes   dc_yes
1234               0            1        1
1234               1            0        0
1234               1            0        1

All I really care about is if for a particular client the variables treatment_yes, ca_yes and dc_yes DO NOT equal 0 (i.e., they each have at least one "1". They could have more than one "1" but as long as they are flagged at least once).

I was thinking my next step might be to collapse the data (how do you do this?) for each unique client ID and sum treatment_yes, dc_yes and ca_yes for each client.

Does that work?

If so, how the heck do I accomplish this? Where do I start?

thanks everyone!

2 Answers2

0

I think the easiest thing to do at this point is to use a proc sql step to find the max value of each of your three variables, aggregated by client_id:

data temp;
    input Client_id $ treatment_yes ca_yes dc_yes;
    datalines;
    1234 0 1 1
    1234 1 0 0
    1234 1 0 1
    ;
run;

proc sql;
    create table temp_collapse as select distinct
        client_id, max(treatment_yes) as treatment_yes,
        max(ca_yes) as ca_yes, max(dc_yes) as dc_yes
        from temp
        group by client_id;
quit;

A better overall approach would be to use the dataset you used to create the _yes variables and do something like max(case when desc = "tx" then 1 else 0 end) as treatment_yes etc., but since you're still new to SAS and understand what you've done so far, I think the above approach is totally sufficient.

Sean
  • 1,120
  • 1
  • 8
  • 14
  • Thank you so much! This solved my question exactly. Really appreciate you giving me an example to run as well:) – chiccaboomberry Jun 28 '16 at 17:32
  • I notice that the proc sql code creates a database that removes all my other variables. Do you know how to do the above but still keep all my other variables there? – chiccaboomberry Jun 28 '16 at 19:14
  • If the other variables are constant for each value of the CLIENT_ID then add them to the SELECT and GROUP BY clauses. If not then exclude them to the GROUP BY and SAS will happily duplicate the new variables generated by the MAX() aggregate function calls onto all rows for the same CLIENT_ID. In that case you could get multiple observations for the same CLIENT_ID. – Tom Jun 29 '16 at 13:10
0

The following code allows you to preserve other variables from your original dataset. I have added two variables (var1 and var2) for illustrative purposes:

data temp;
    input Client_id $ treatment_yes ca_yes dc_yes var1 var2 $;
    datalines;
    1234 0 1 1 10 A
    1234 1 0 0 11 B
    1234 1 0 1 12 C
    ;
run;

Join the dataset with itself so that each row of a client_id in the original dataset is merged with its corresponding row in an aggregated dataset constructed in a subquery.

proc sql;
    create table want as
        select *
            from temp as a 
                left join (select client_id, 
                    max(treatment_yes) as max_treat,
                    max(ca_yes) as max_ca,
                    max(dc_yes) as max_dc
                from temp
                    group by client_id) as b
                        on a.client_id=b.client_id;
quit;
udden2903
  • 783
  • 6
  • 15