Ignoring the unexplained html file and any random fluctuations in CPU and execution time for the sake of discussion, the short answer might just be that SAS handles different joins in different ways by default. Perhaps this doesn't make much of a difference with files as small as the examples here, but it's worth knowing about.
A longer answer is that this may be dependent to some extent on the exact version of SAS you're using. In SAS 9.4 with your example datasets, the query plan I see generated is the same for both joins if you leave proc sql
to its own devices:
52 /* Method 1: */
53
54 proc sql _method;
55 select a.*, b.*
56 from data1 as a, data2 as b
57 where a.name1= data2.name2
58 ;
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
59 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5469.21k
OS Memory 32668.00k
Timestamp 09/08/2015 06:43:09 PM
Step Count 457 Switch Count 50
Page Faults 0
Page Reclaims 87
Page Swaps 0
Voluntary Context Switches 156
Involuntary Context Switches 14
Block Input Operations 0
Block Output Operations 16
60 /* Method 2: */
61
62 proc sql _method;
63 select a.* , b.*
64 from data1 as a inner join data2 as b
65 on a.name1 = b.name2
66 ;
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
67 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5467.81k
OS Memory 32924.00k
Timestamp 09/08/2015 06:43:09 PM
Step Count 458 Switch Count 50
Page Faults 0
Page Reclaims 26
Page Swaps 0
Voluntary Context Switches 167
Involuntary Context Switches 11
Block Input Operations 0
Block Output Operations 8
You can also confirm this via the _tree
option, which produces a more verbose version of the query plan. See here for further details re the output of the _method
and _tree
options.
Some differences do emerge if you steer the query planner towards using different join algorithms, however:
52 /* Method 1: */
53
54 proc sql _method magic=101;
55 select a.*, b.*
56 from data1 as a, data2 as b
57 where a.name1= data2.name2
58 ;
NOTE: PROC SQL planner chooses sequential loop join.
NOTE: SQL execution methods chosen are:
sqxslct
sqxjsl
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
59 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 5468.53k
OS Memory 32668.00k
Timestamp 09/08/2015 06:41:54 PM
Step Count 451 Switch Count 52
Page Faults 0
Page Reclaims 101
Page Swaps 0
Voluntary Context Switches 182
Involuntary Context Switches 14
Block Input Operations 0
Block Output Operations 8
60 /* Method 2: */
61
62 proc sql _method magic=102;
63 select a.* , b.*
64 from data1 as a inner join data2 as b
65 on a.name1 = b.name2
66 ;
NOTE: PROC SQL planner chooses merge join.
NOTE: SQL execution methods chosen are:
sqxslct
sqxjm
sqxsort
sqxsrc( WORK.DATA1(alias = A) )
sqxsort
sqxsrc( WORK.DATA2(alias = B) )
67 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5467.12k
OS Memory 32924.00k
Timestamp 09/08/2015 06:41:54 PM
Step Count 452 Switch Count 60
Page Faults 0
Page Reclaims 69
Page Swaps 0
Voluntary Context Switches 197
Involuntary Context Switches 13
Block Input Operations 0
Block Output Operations 16
For more details on the magic=
option, see here. I don't recommend using it in any sort of production environment, but it can sometimes be useful for this sort of thing.
Given the tiny differences in CPU time for files this small even when forcing SAS to use different merge methods, I very much suspect some other factor is causing this; probably the mystery html file.