0

I am constantly using SAS datasets in SAS EG to create macro variables that can be used as variables in a query from SAS EG to my internal servers. There is a character limit of 65,534 for a macro variable. When I need to get 100k ids that are 9 to 15 digits in length the number of macro variables required to create really adds up. I am asking the community if there is a way to create a large number of macro variables with a loop instead of doing it manually.

For instance the manual way to create these macro variables would be something like the following:

    proc sql; create table alerts as select distinct review_id format best12. from q4_21_alerts order by review_id;quit;
    proc sql; create table alerts1 as select review_id, monotonic() as number from alerts order by number; quit;

    proc sql; select distinct review_id into:alert_ids1 separated by ',' from alerts1 where number between 1 and 7000; quit;

    proc sql; select distinct review_id into:alert_ids2 separated by ',' from alerts1 where number between 7001 and 14000; quit;

    proc sql; select distinct review_id into:alert_ids3 separated by ',' from alerts1 where number between 14001 and 21000; quit;

    proc sql; select distinct review_id into:alert_ids4 separated by ',' from alerts1 where number between 21001 and 27000; quit;
    .
    .
    .
    proc sql; select distinct review_id into:alert_ids21 separated by ',' from alerts1 where number between 140001 and 147000; quit;

I am hoping to find a way to do something like the following:

N = 145417 
#total number of review_ids that need to be contained in SAS macro variables
L = 8 
#length/number of characters/digits in each review_id
L = L + 1
#length/number of characters/digits in each review_id with 1 added for the comma separation in macro variable
stop = N*L

i = 1
while(i<=stop){
 some code to create all 21 macro variables 
}

then be left with macro variables alert_ids1, alert_ids2,...,alert_ids21 that would contain all 145,417 ids i need to then use in a query for my internal servers.

Any help would be appreciated!

I've used google and sas communities and have code to do this process manually...

2 Answers2

1

I am unsure what your final query is and would advise building a SQL query that specifically filters to the IDs you want. e.g.:

proc sql;
    create table want as
        select * 
        from have
        where id in(select id from id_table)
    ;
quit;

But if you need to have a comma-separated list of macro variables that abides by the 65,534 character length, the safest way is to create one ID per macro variable. You can very easily do this with a data step.

data _null_;
    set alerts1;
    
    call symputx(cats('alert_id', _N_), review_id); 
    call symputx('n_ids', _N_);
run;

This will create the macro variables:

alert_id1
alert_id2
alert_id3 
...

Now you need to create a loop that makes these all comma-separated.

%macro id_loop;
    %do i = 1 %to &n_ids;
&&alert_id&i %if(&i < &n_ids) %then %do;,%end;
    %end;
%mend;

Note the code format is a bit strange to keep the output formatted correctly. Now run this macro and you'll see a comma-separated list of every alert ID:

%put %id_loop;

id1, id2, id3, ...

You can put this in a query, such as where alert_id in (%id_loop). Keep in mind that doing this will load up the symbol table with a ton of macro variables. This is not the recommended way to query, but it is one way to achieve what you asked.

Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
1

Use a data step instead of SQL to create the macro variables. You can even create a second macro variable that references all of the generated macro variables.

For example say you have determined that you can always fit 1000 values into a single variable (the limit for a data step variable is 32K instead of the 64K limit of a macro variable) then you could use a data step like this:

data _null_;
   length string list $32767 ;
   group+1;
   do i=1 to 1000 until(eof);
     set alerts end=eof;
     string=catx(',',string,review_id);
   end;
   call symputx(cats('alert_id',group),string);
   list = catx(',',list,cats('&alert_id',group'));
   if eof then call symputx('alerts',list);
run;

Now you can use that single macro variable ALERTS that consists of the string

 &alert_id1,&alert_id2,....

in your SQL code:

 where review_id in (&alerts)

And filter on all of the values in the ALERTS dataset even if the total string is longer than 64K. Since you put 1000 into each macro variable and you can fit about 3000 references to those macro variables into the ALERTS macro variable you could store up to 3 million values.

Of course you might hit a limit on what the SQL processor can handle.

Tom
  • 47,574
  • 2
  • 16
  • 29