1

I tried to filter data where they are on a list by using Data step in SAS

proc sql;
create table id_list as
select distinct id from customer;
quit;

data test;
set fulldata;
where id in id_list;
run;

It doesnt work. However, if I use "where id in (1,2,3)" it works. Could anyone please help me with where in a list of data ? Thanks

Linh
  • 13
  • 3

3 Answers3

1

You need to use a macro variable to save and reference your id list. You cannot reference a separate table in a where statement within the data step.

proc sql noprint;
    select distinct id 
    into :id_list separated by ','
    from customer
    ;
quit;

&id_list will resolve to 1,2,3 if your distinct customer IDs are 1, 2, and 3.

data test;
    set fulldata;
    where id in(&id_list);
run;
Tom
  • 47,574
  • 2
  • 16
  • 29
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
  • Yes, it works. However, the id_list is very large (about 9000 records) so it has an error. ERROR: The length of the value of the macro variable ID_LIST (90795) exceeds the maximum length (65534). The value has been truncated to 65534 characters. – Linh Nov 18 '22 at 22:05
  • In that case, use the SQL approach that Reeza posted. – Stu Sztukowski Nov 18 '22 at 22:24
  • Thanks, I want to employ DATA step as I have more than 1 table i.e fulldata_1, fulldata_2,... etc if I using SQL I need to write long code and union all together – Linh Nov 19 '22 at 21:48
  • You may want to consider using DS2 in that case. You can combine SQL and data step-like code together. I can post an example of this for you in the next few days. – Stu Sztukowski Nov 20 '22 at 00:49
1

If your fulldata is sorted or indexed by id then you can use a MERGE.

This approach is useful when the list is very large and could exceed 64K characters when placed in a macro variable.

proc sort data=customer(keep=id) out=list nodupkey;
  by id;

data want;
  merge fulldata(in=full) list(in=list);
  by id;
  if full and list;
run;

For the case of wanting to stack multiple data sets the use of a hash is recommended.

Example:

Several big tables with some overlapping id values are to be stacked and filtered by matching ids to those in a smaller table that might have repeated ids.

data big1 big2 big3 big4 big5 big6 big7;
  do id = 1 to 6666;
    if 0.00 <= id / 3333 <= 0.50 then output big1;
    if 0.25 <= id / 3333 <= 0.75 then output big2;
    if 0.50 <= id / 3333 <= 1.00 then output big3;
    if 0.75 <= id / 3333 <= 1.25 then output big4;
    if 1.00 <= id / 3333 <= 1.50 then output big5;
    if 1.25 <= id / 3333 <= 1.75 then output big6;
    if 1.50 <= id / 3333 <= 2.00 then output big7;
  end;
run;

data small;
  do _n_ = 1 to 666;
    id = rand('integer', 6666);
    output;
    do while (rand('uniform') < 0.10);
      output;
    end;
  end;
run;

data want;
  attrib id length=8;

  if _n_ = 1 then do;
    declare hash lookup (dataset:'small');
    lookup.defineKey('id');
    lookup.defineDone();
  end;

  set big1-big7 indsname=from;
  source=from;

  if lookup.check() = 0;
run;
Richard
  • 25,390
  • 3
  • 25
  • 38
  • Massive thanks! This is the code that I need. What if I have more than 2 tables fulldata (let say fulldata_1, fulldata_2,..etc) How can I write it in one code without using another data step? – Linh Nov 20 '22 at 02:08
  • Are you saying you want to stack the >1 `fulldata` table ? – Richard Nov 20 '22 at 12:42
1

SQL is easiest here directly, using a subquery in the IN filter.

proc sql;
create table test as
select * 
from fulldata 
where id in 
  (select distinct id from customer);
quit;
Reeza
  • 20,510
  • 4
  • 21
  • 38
  • Thanks Reeza, it helps. The reason I want to employ DATA step as I have more than 1 table i.e fulldata_1, fulldata_2,... etc if I using SQL I need to write long code and union all together – Linh Nov 19 '22 at 21:49