0

I would like to know how to merge two datasets in SAS using a variable's value in the first dataset to select and test a variable in the second dataset.

As an example consider two datasets. The first dataset contains four baby names and the days they were born. The second data set contains three doctors and an array of indicator variables noting if each doctor worked on a particular day. For example Dr. Smith worked on days 2 and 3 only. I would like to create a dataset that lists all the possible baby-doctor combinations where the doctor was working on the day the baby was born.

data babies;
input baby_name $ birth_day;
datalines;
Jake 1
Sonny 4
North 5
Apple 6
;
run;

data doctors;
input  DrLastname $ day1 day2 day3 day4 day5 day6;
datalines;
Jones 1 0 0 1 1 1 
Smith 0 1 1 0 0 0 
Lewis 1 1 1 0 0 0 
;
run;

The solution seems like it should be something like this

proc sql;
 create table merged as
 select babies.*, doctors.* 
 from babies, doctors
 where doctors.day(babies.birth_day) = 1; *<--- incorrect;
quit;

The output should be:

baby_name birth_day DrLastName
Jake      1         Jones
Jake      1         Lewis
Sonny     4         Jones
North     5         Jones
Apple     6         Jones

I have run into this problem a few times and would love to know if this is kind of merge is possible in SAS. Thanks for any help you can provide.

Adam Black
  • 337
  • 3
  • 13

2 Answers2

3

While I probably would also transpose the dataset, it is possible to do so without transposing.

data babies_doctors;
  set babies;
  do _i = 1 to nobs_doctors;
    set doctors point=_i nobs=nobs_doctors;
    array days day1-day6;
    if days[birth_Day] then output;
  end;
run;

This will not be fast, as it checks all rows in the dataset, but it's possible.

Fastest is probably to load it into a vertical hash table (which you could do easily) or a temporary array.

data babies_doctors_array;
  array drnames[32767] $80 _temporary_;
  array drdays[32767,6] _temporary_;
  if _n_=1 then do;
    do _i = 1 to nobs_doctors;
      set doctors point=_i nobs=nobs_doctors;
      array days day1-day6;
      drnames[_i]=DrLastname;
      do _j = 1 to dim(days);
        drdays[_i,_j]=days[_j];
      end;
    end;
  end;
  set babies;
  do _k = 1 to nobs_doctors;
    if drdays[_k,birth_day]=1 then do;
        baby_drlastname = drnames[_k];
        output;
    end;
  end;
run;
Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thank you for these solutions! Am I correct in assuming that there is no way to perform this type of dynamic (for lack of a better word) merge using proc sql? – Adam Black Jul 30 '15 at 19:13
  • There's always a way to do everything in sql, but I don't think it would be as easy. You can't use `vname` in sql, though, which is what you'd have to do really (and there is no array concept), so it would be a lot of typing. – Joe Jul 30 '15 at 19:15
1

I might shift the second dataset and then merge on day.

Something like (in untested pseudo code):

    data new_1-new_6;
    set doctor;
    array day_1-day_6 day_{6}
    for i in 1 to 6:
      if day_{i} = 1 then do;
         day = i;
         output new_{i};
      end;
    end;
    run;

    data stacked;
      set day_1-day_6;
    run;

Then simply merge based on the field day.

JJFord3
  • 1,976
  • 1
  • 25
  • 40