2

I have performed a left join where the left table has 500,000 observations. In some cases the left join has been successful for Business_Line = "Retail" where as the next observation is left blank, why is this?

the code I used:

proc sql;
create table joined2 as
select a.*
      ,b.Join1
      ,b.Join2
      ,b.Join3
from joined as a
left join Sasdata.Assumptions as b
on a.Business_Line = b.Business_Line;
quit;

the two tables look like

data joined;
input Business_Line $;
datalines;
Retail
Retail
Retail
Business
Business
;
run;

the table to join looks like

data sasdata.assumptions;
input Business_Line $ Join1 Join2 Join3;
datalines;
Retail 10% 10% 10%
Business 20% 10% 5%
;
run;

the current resulting table looks like

 business_line join1 join2 join3
 Retail 10% 10% 10%
 Retail . . .
 Business 20% 10% 5%
 Business . . .
78282219
  • 593
  • 5
  • 21

1 Answers1

2

The sample code does not demonstrate the issue.

Indeed, missing values for join1-join3 will not happen when the actual business_lines values are 'Retail' or Business. You get a result with 3x1 rows with Retail and 2x1 rows with Business.

The missing values occur when the join key in the left table does not have a corresponding match in the right table. This can appear to happen in SAS if the variables are formatted.

Suppose business_line was an integer with formatted value

proc format;
  value line
    101 = 'Retail'
    102 = 'Retail'
    103 = 'Retail'
    201 = 'Business'
    202 = 'Business'
  ;

Updated data with formatted business_line

data joined;
input Business_Line;
format Business_Line line.;
datalines;
101
102
102
201
202
run;    

data assumptions;
input Business_Line Join1 Join2 Join3;
format Business_Line line.;
datalines;
101 10 10 10
201 20 10  5
run;

Join that has some unmatched underlying values

proc sql;
create table joined2 as
select a.*
      ,b.Join1
      ,b.Join2
      ,b.Join3
from joined as a
left join Assumptions as b
on a.Business_Line = b.Business_Line;
quit;

options nocenter; ods listing;
proc print data=joined2;
run;

Has results demonstrating missing values

       Business_
Obs      Line       Join1    Join2    Join3

 1     Retail         10       10       10
 2     Retail          .        .        .
 3     Retail          .        .        .
 4     Business       20       10        5
 5     Business        .        .        .
Richard
  • 25,390
  • 3
  • 25
  • 38
  • Does this occur even though 75% of the observations do join but then suddenly cuts off for the same business name label? – 78282219 Apr 04 '18 at 19:54
  • If the variables are not formatted, and you are still getting seemingly non-matches, you may have data value issues, such as leading or trailing non-space whitespace characters. I would recommend doing a Proc FREQ of the join variables in each data set. If you have two rows that look the same (business_line) you have a data issue. – Richard Apr 05 '18 at 06:19