0

I have a SAS Data set called coaches_assistants with the following structure. There are always only two records per TeamID.

TeamID     Team_City    CoachCode
123        Durham       Head_242
123        Durham       Assistant_876
124        London       Head_876
124        London       Assistant_922
125        Bath         Head_667
125        Bath         Assistant_786
126        Dover        Head_544
126        Dover        Assistant_978
...        ...          ....

What I'd like to do with this is to create a data set with an extra field called AssistantCode and make it look like:

TeamID     Team_City    HeadCode   AssistantCode
123        Durham       242        876
124        London       876        922
125        Bath         667        786
126        Dover        544        978
...        ...          ...        ...

If possible, I'd like to do this in a single DATA step (though I recognize that I might need a PROC SORT step first). I know how to do it in python or ruby or any traditional scripting languages, but I don't know how to do it in SAS.

What's the best way to do this?

Clay
  • 2,949
  • 3
  • 38
  • 54

3 Answers3

2

While it's possible to do in one datastep, I generally find that this sort of problem is better served in PROC TRANSPOSE. Less manual coding this way and more flexibility for new things (say a new value "HeadAssistant" appeared, this would instantly work).

data have;
length coachcode $25;
input TeamID     Team_City  $  CoachCode $;
datalines;
123        Durham       Head_242
123        Durham       Assistant_876
124        London       Head_876
124        London       Assistant_922
125        Bath         Head_667
125        Bath         Assistant_786
126        Dover        Head_544
126        Dover        Assistant_978
;;;;
run;

data have_t;
set have;
id=scan(coachcode,1,'_');
val = scan(coachcode,2,'_');
keep teamId team_city id val;
run;

proc transpose data=have_t out=want(drop=_name_);
by teamID team_city;
id id;
var val;
run;
Joe
  • 62,789
  • 6
  • 49
  • 67
  • I like how clean this looks, but when I tried to run it with my data set (with 120,000+ observations), the `want` table was created without any columns for some reason. I'll try it again later. Thanks! – Clay Jul 06 '13 at 23:57
1

Here are two possible solutions (one using a data step as requested and another using PROC SQL):

data have;
   length TeamID $3 Team_City CoachCode $20; 
   input TeamID $ Team_City $ CoachCode $;
   datalines;
123        Durham       Head_242
123        Durham       Assistant_876
124        London       Head_876
124        London       Assistant_922
125        Bath         Head_667
125        Bath         Assistant_786
126        Dover        Head_544
126        Dover        Assistant_978
run;

/* A data step solution */
proc sort data=have;
   by TeamID;
run;

data want1(keep=TeamID Team_City HeadCode AssistantCode);
   /* Define all variables, retain the new ones */
   length TeamID $3 Team_City $20 HeadCode $3 AssistantCode $3; 
   retain HeadCode AssistantCode; 
   set have;
      by TeamID;
   if CoachCode =: 'Head'
      then HeadCode = substr(CoachCode,6,3);
      else AssistantCode = substr(CoachCode,11,3);
   if last.TeamID;
run;

/* An SQL solution */
proc sql noprint;
   create table want2 as
   select TeamID
        , max(Team_City) as Team_City
        , max(CASE WHEN CoachCode LIKE 'Head%'
                   THEN substr(CoachCode,6,3) ELSE ' '
              END) LENGTH=3 as HeadCode
        , max(CASE WHEN CoachCode LIKE 'Assistant%'
                   THEN substr(CoachCode,11,3) ELSE ' '
              END) LENGTH=3 as AssistantCode
   from have
   group by TeamID;
quit;

PROC SQL has the advantage of not requiring you to sort the data in advance.

BellevueBob
  • 9,498
  • 5
  • 29
  • 56
  • Thanks for the details. I went with the first approach and it worked like a charm. I haven't yet delved into SAS SQL, so I'll look at it later when I get the chance. – Clay Jul 06 '13 at 23:56
  • 1
    When you use a `BY` statement in data step processing, special variables are automatically created to assist processing. Two "dot" variables are created for each variable listed in the statement: `FIRST.variable` and `LAST.variable`, which identify the relative position of an obs within a group. `if LAST.TeamID;` is a "subsetting-IF" statement used to only output one obs per TeamID. – BellevueBob Jul 08 '13 at 14:17
  • Thank you for the explanation. I didn't know about `FIRST` and `LAST`. – Clay Jul 08 '13 at 17:34
0

This assumes you've sorted the data by teamID, and head coaches always come before assistants. Caveat: untested (I really need to get access to SAS again....)

data want (drop=nc coachcode);
    set have;
    length headcode assistantcode $3;
    retain headcode;
    by teamid;
    nc = length(coachcode);
    if substr(coachcode, 1, 4) = 'Head' then
        headcode = substr(coachcode, nc-2, nc);
    else
        assistantcode = substr(coachcode, nc-2, nc);
    if last.teamid;
run;
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187