Deleting Observations That Are Not Part of a List in SAS

Question

My first SAS data set, ds1 contains dates, firms and share prices. My second data set, ds2 contains a subset of the firms in ds1. I'd like to create ds3 which contains all of the observations in ds1 provided a firm in ds1 is also in ds2. I try doing this as follows:

DATA ds3; set ds1; IF firm IN (d2); run;

The above does not work as planned as ds3 ends up containing no observations. I believe the problem is the IF IN statement. I could manually type all the firms in the parenthesis instead of putting d2 there but that would be very inefficient for me.

score 1 · Answer 1 · answered Sep 07 '14 at 21:39

You have several options here; the correct one depends largely on your particular needs.

What you're effectively doing is joining two tables together. So, a MERGE or a SQL JOIN would be a simple solution.

data ds3;
  merge ds1(in=_ds1) ds2(in=_ds2 keep=firm);
  by firm;
  if _ds1 and _ds2;
run;

That joins ds1 and ds2, only keeping the firm variable from ds2, and keeps only firms that are in both. Both DS1 and DS2 need to be sorted by firm;, and DS2 should have have only unique values of firm - no duplicates.

SQL is also pretty easy.

proc sql;
  create table ds3 as
    select * from ds1 where exists (
      select 1 from ds2 
         where ds1.firm=ds2.firm
    );
quit;

That's a bit closer to your terminology (unsurprising as SQL attempts to be close to natural language for many simple queries). That doesn't require either sorting or uniqueness, although this won't be particularly fast.

You could also store the DS2 firms in a format, or use a hash table to allow you to open DS2 alongside DS1. You could also use a keyed set statement. All of these are slightly more complicated to implement, but are generally faster as they don't require sorting and don't repeatedly reference the same data as the SQL does.

Deleting Observations That Are Not Part of a List in SAS

1 Answers1