Merge vs Proc sql

Question

I have two ways of doing the same thing but am not getting the same results. what is the reason for this?

First way:

Proc sql;
    Create table abc.immu as
    select ID, DATE, CODE
    from xyz.imm
    where DATE between to_date('2008-01-01','YYYY-MM-DD') and to_date('2016-01-31','YYYY-MM-DD')
    Order by CODE);
        Quit;

Proc sql ;
    Create table abc.testb as
             select CODE, description from xyz.REF_code where lower(description)like '%hep b%';
 Quit;

proc sort data=abc.testb;
by code;
run;


data abc.testb1;
merge abc.immu(in=a) abc.testb (in=b);
by code;
if a=1 and b=1;
run;

Second way

Create table abc.testb1 as
        Select ID, DATE,CODE
    From xyz.imm
    where CODE in (select CODE from xyz.REF_code where lower(description)like '%hep b%')
    AND  DATE between to_date('2008-01-01','YYYY-MM-DD') and to_date('2016-01-31','YYYY-MM-DD')
    Order by ID, DATE;
    Quit;

proc sort data=abc.testb1 nodupkey;
by ID DATE;
run;

Why am not getting the same result? is it not two different ways to achieve same thing?

to_date() is working for you in SAS? That's not SAS code afaik, unless you're using pass through, which doesn't appear to be happening based on what you've posted. I *think* the results may differ if you have duplicates in either table. Run PROC COMPARE to see where the results differ and that'll help you isolate the issue. — Reeza, Aug 17 '17 at 19:33
yes using pass through, even after running no dup key the observations are not same — SAW, Aug 17 '17 at 19:36
You do not have the equivalent of `NODUPKEY` in the first method. — Tom, Aug 18 '17 at 02:00

score 0 · Answer 1 · answered Aug 17 '17 at 20:09

Best guess is that you have duplicate CODE records in abc.testb in the first block. The abc.immu table has that code, but less records than the testb for 1 or more CODE values.

As an example:

Test Data:

data test;
do cd='a','b','c';
    do i=1 to 10;
        output;
    end;
end;
run;

data cds;
do cd='a','c';
    desc=1;
    output;
    desc=2;
    output;
end;
run;

When we have more records for each code in test both work as desired. Below generates the same records.

proc sql noprint;
create table sql as
select * from test
    where cd in (select cd from cds);
quit;

data dataStep;
merge test(in=a) cds(in=b);
by cd;
if a and b;
run;

However, if we strip the cd="a" group to only 1 in test we have an issue:

data test2;
set test;
if cd='a' then
    if i=1;
run;

proc sql noprint;
create table sql2 as
select * from test2
    where cd in (select cd from cds);
quit;

data dataStep2;
merge test2(in=a) cds(in=b);
by cd;
if a and b;
run;

Now there is an extra record in dataStep2.

To fix this you have a few options:

Remove the description column from the select and use a DISTINCT keyword. Then testb will only have 1 row for each CODE.
Use the SQL solution.

I have checked the codes, test b has only 1 row for each code — SAW, Aug 17 '17 at 20:33
without data and a working example, we have to guess. As I said, this was a guess. If you can replicate the problem with an example, someone can explain the issue. See https://stackoverflow.com/help/mcve — DomPazz, Aug 18 '17 at 01:06

Merge vs Proc sql

1 Answers1