0

i am learning sas proc sql statement. I observed that although the result are the same for following two methods, the real and cpu time is different. I wonder why there exist the difference.

data data1;
    input name1 $ choice $;
    datalines;
John A
Mary B
Peter C
;
run;

data data2;
    input name2 $ choice2 $;
    datalines;
John B
Mary C
Peter B
run;

Method 1:

proc sql;
    select a.*, b.*
    from data1 as a, data2 as b 
    where a.name1= data2.name2
    ;
quit;

Method 2:

proc sql;
    select a.* , b.*
    from data1 as a inner join data2 as b
        on a.name1 = b.name2
    ;
quit;

enter image description here

useR
  • 3,062
  • 10
  • 51
  • 66
  • If you don't specify a join type you're using a natural join and the where clause is applied on the results. If you specify an inner join with an ON then the results are filtered in the join process. I'm sure someone else will explain it better so I won't post this as an answer – Reeza Sep 08 '15 at 04:16
  • Also, IMO you should always explicitly specify your join type. – Reeza Sep 08 '15 at 04:17
  • A bit more info on the order of SQL processing in SAS. http://blogs.sas.com/content/sastraining/2013/02/04/a-database-professionals-best-friend-2/ – Reeza Sep 08 '15 at 04:23
  • @useR, Can you please add time details in your question. This will be helpful. Please run the same code more than once and cross check, as other system processes may affect this process. – Kay Sep 08 '15 at 06:13
  • amended. s you can see proc sql with on one is much faster. is there a reason? – useR Sep 08 '15 at 06:52
  • Why does one have HTML body file created? If it's the first process and creating the HTML file for the first time that takes more time, try flipping them in order and see. – Reeza Sep 08 '15 at 14:41

1 Answers1

0

Ignoring the unexplained html file and any random fluctuations in CPU and execution time for the sake of discussion, the short answer might just be that SAS handles different joins in different ways by default. Perhaps this doesn't make much of a difference with files as small as the examples here, but it's worth knowing about.

A longer answer is that this may be dependent to some extent on the exact version of SAS you're using. In SAS 9.4 with your example datasets, the query plan I see generated is the same for both joins if you leave proc sql to its own devices:

52         /* Method 1: */
 53         
 54         proc sql _method;
 55             select a.*, b.*
 56             from data1 as a, data2 as b
 57             where a.name1= data2.name2
 58             ;

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjhsh
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 59         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5469.21k
       OS Memory           32668.00k
       Timestamp           09/08/2015 06:43:09 PM
       Step Count                        457  Switch Count  50
       Page Faults                       0
       Page Reclaims                     87
       Page Swaps                        0
       Voluntary Context Switches        156
       Involuntary Context Switches      14
       Block Input Operations            0
       Block Output Operations           16


 60         /* Method 2: */
 61         
 62         proc sql _method;
 63             select a.* , b.*
 64             from data1 as a inner join data2 as b
 65                 on a.name1 = b.name2
 66             ;

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjhsh
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 67         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5467.81k
       OS Memory           32924.00k
       Timestamp           09/08/2015 06:43:09 PM
       Step Count                        458  Switch Count  50
       Page Faults                       0
       Page Reclaims                     26
       Page Swaps                        0
       Voluntary Context Switches        167
       Involuntary Context Switches      11
       Block Input Operations            0
       Block Output Operations           8

You can also confirm this via the _tree option, which produces a more verbose version of the query plan. See here for further details re the output of the _method and _tree options.

Some differences do emerge if you steer the query planner towards using different join algorithms, however:

 52         /* Method 1: */
 53         
 54         proc sql _method magic=101;
 55             select a.*, b.*
 56             from data1 as a, data2 as b
 57             where a.name1= data2.name2
 58             ;
 NOTE: PROC SQL planner chooses sequential loop join.

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjsl
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 59         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.01 seconds
       memory              5468.53k
       OS Memory           32668.00k
       Timestamp           09/08/2015 06:41:54 PM
       Step Count                        451  Switch Count  52
       Page Faults                       0
       Page Reclaims                     101
       Page Swaps                        0
       Voluntary Context Switches        182
       Involuntary Context Switches      14
       Block Input Operations            0
       Block Output Operations           8


 60         /* Method 2: */
 61         
 62         proc sql _method magic=102;
 63             select a.* , b.*
 64             from data1 as a inner join data2 as b
 65                 on a.name1 = b.name2
 66             ;
 NOTE: PROC SQL planner chooses merge join.

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjm
               sqxsort
                   sqxsrc( WORK.DATA1(alias = A) )
               sqxsort
                   sqxsrc( WORK.DATA2(alias = B) )
 67         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5467.12k
       OS Memory           32924.00k
       Timestamp           09/08/2015 06:41:54 PM
       Step Count                        452  Switch Count  60
       Page Faults                       0
       Page Reclaims                     69
       Page Swaps                        0
       Voluntary Context Switches        197
       Involuntary Context Switches      13
       Block Input Operations            0
       Block Output Operations           16

For more details on the magic= option, see here. I don't recommend using it in any sort of production environment, but it can sometimes be useful for this sort of thing.

Given the tiny differences in CPU time for files this small even when forcing SAS to use different merge methods, I very much suspect some other factor is causing this; probably the mystery html file.

user667489
  • 9,501
  • 2
  • 24
  • 35