0

I want to compare two datasets and extract the matching observations to a new data set. For example,

I have two datasets, one from October and another from November, as follows.

Dataset October

visitor_id ctn visits kjsjakd83 3243244234 1 sakdsadda 5432223442 2 jhk43242l 3243243244 1 iiiewdaja 9839422022 2

Dataset November

visitor_id ctn visits kjsjakd83 3243244234 1 432hj4hj 3243243244 2 jhk43242l 3243243244 1 xfd3x424 2342344234 2

Now, I want to compare these datasets by CTN and extract all the observations from October dataset for which a matching CTN is found in November dataset. So, the extracted dataset must be as below.

Dataset Match

visitor_id ctn visits kjsjakd83 3243244234 1 jhk43242l 3243243244 1

How can I do this in SAS?

user1219801
  • 169
  • 4
  • 17
  • Please post what code you have tried so far, this sounds like a simple inner join on the face of it. It's also not clear to me whether the last dataset (Match) is the third dataset you refer to, or the output dataset you want. Finally, why are there 2 visitor ids for the same ctn in the November data? Is which one you take relevant? – Longfish May 25 '16 at 10:07
  • `proc sql; create table final as select Distinct a.CTN, a.visits from Nov a inner join Oct b on b.CTN = a.CTN; quit;` – anurag choubey May 25 '16 at 11:26
  • The above code show matching rows from both the data set however i only want matching rows only from November data set – anurag choubey May 25 '16 at 11:29

3 Answers3

1

If you just want the records from OCTOBER whose CTN values are in NOVEMBER then it would seem that this is the most straight forward SQL syntax.

select * from OCTOBER 
  where CTN in (select CTN from NOVEMBER)
;
Tom
  • 47,574
  • 2
  • 16
  • 29
0

Your code works fine if you make oct "a" and nov "b". Also, as Keith mentioned, you have duplicate ctn values in your nov dataset, which you should either fix or explain how you want handled; it may be best to join on a.visitor_id = b.visitor_id and a.ctn = b.ctn.

proc sql;
    create table final as select distinct
        a.* from oct as a inner join nov as b
        on a.ctn = b.ctn;
quit;
Sean
  • 1,120
  • 1
  • 8
  • 14
  • That's in fact correct. OP does need to join on both ids and return only Nov records: `b.*` Please revise accordingly. – Parfait May 25 '16 at 13:16
0

@Tom's answer is how I'd do it, but here's how to do it without PROC SQL:

proc sort data=october;
  by ctn;
run;

proc sort data=november;
  by ctn;
run;

data match;
  merge october (IN=oct)
        november (IN=nov keep=ctn);
  by ctn;
  if oct and nov;
run;
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61