0

I am very keen to learn whether I can handle or not such situations in SAS Base without using SAS IML;

Let's say I have the vector have

            a    b     c     d    e        f
           1001 JPN 10,000  50%  JPN    2,000
           1001 EUR 12,648  100% EUR    3,000
           1001 USD 15,997  50%  USD    5,000
           1001 JPN 20,233  20%  JPN    8,000
           1001 EUR 25,591  20%  EUR    9,000
           1001 USD 32,368  50%  USD    4,000
           1002 JPN 28,393  50%  JPN    6,000
           1002 EUR 24,906  100% EUR    4,000
           1002 USD 21,847  50%  USD    8,000
           1002 TRY 19,164  20%  JPN    6,000
           1002 EUR 16,811  50%  EUR    15,000
           1002 USD 14,746  100% USD    52,000
           1003 USD 10,000  50%  XVN    8,000



 %macro;
 % let i = 1;
 data want;
 set have;
 %do %while a[&i]=a[eval(&i+1)] ;
 b = &i;
 &i=eval(&i+1);
 %end
 %mend

What I would like to do is for a with b=e to take the difference of max(c) and max(f) and multiply this difference with d and then for each distinct a to sum these outcomes. This will be iterative. The table I compose here just a small representation of the case.

Thanks

Jonsi Billups
  • 133
  • 1
  • 3
  • 15
  • Can you explain in words what output you want? It sort of looks like you want to first. processing, but without understanding the output you want it is hard to give any advice. – Tom Sep 18 '15 at 11:20
  • I just want to build loops based on the observation number of elements of a column like in R or Matlab etc. but I could not do it in SAS Base. I wonder if it is possible or not. – Jonsi Billups Sep 18 '15 at 11:26
  • 1
    So a basic data step is a loop based on the observations in the input data step. What do you actually want to do a loop over? – Tom Sep 18 '15 at 11:27
  • I just want to know how I can build loops by using index number in SAS or is it possible to use index number just like in R or Matlab? – Jonsi Billups Sep 18 '15 at 11:37
  • Index number of what to do what? Can you describe an actual problem where such an approach makes sense? That will be easier for someone to answer than a totally general question. – Tom Sep 18 '15 at 11:40
  • 1
    Post the expected output given the sample data above. – Robert Penridge Sep 18 '15 at 14:23
  • You can do what you want, but you need to think about it differently, using things are native to SAS. If you want to replicate the same way then it's best to go with IML. If none of the answers below meet your needs, I highly agree with Robert's comment above, post a worked example. – Reeza Sep 18 '15 at 15:01
  • This question doesn't make any sense in the context of SAS. SAS is not R or Matlab. SAS doesn't have vectors. You don't control looping (mostly) over the data. SAS loops for you. Your question isn't really answerable with your apparent level of understanding of Base SAS. I suggest learning some basic Base SAS, and then come back and see if you need additional help. – Joe Sep 18 '15 at 15:50

3 Answers3

1

It sounds like what you mean by this phrase "for a with b=e" is that you want to process BY A the records WHERE B=E.
So in SAS if you want to filter the records to use you would use a WHERE statement or option or an IF statement. To process groups of records with same value of a set of variables you would use a BY statement (or GROUP BY in PROC SQL code). So here is your example dataset.

data have ;
  length a 8 b $3 c d 8 e $3 f 8;
  informat c f comma32. d percent. ;
  input a b c d e f ;
cards;
1001 JPN 10,000  50%  JPN    2,000
1001 EUR 12,648  100% EUR    3,000
1001 USD 15,997  50%  USD    5,000
1001 JPN 20,233  20%  JPN    8,000
1001 EUR 25,591  20%  EUR    9,000
1001 USD 32,368  50%  USD    4,000
1002 JPN 28,393  50%  JPN    6,000
1002 EUR 24,906  100% EUR    4,000
1002 USD 21,847  50%  USD    8,000
1002 TRY 19,164  20%  JPN    6,000
1002 EUR 16,811  50%  EUR    15,000
1002 USD 14,746  100% USD    52,000
1003 USD 10,000  50%  XVN    8,000
run;

First you can find the max of the C and F variables using PROC SUMMARY.

proc summary data=have nway ;
  by a ;
  where b=e ;
  var c f ;
  output out=summary max= / autoname ;
run;

Then you can merge back with the detail records to apply calculate the difference and multiply by the percentage.

data middle ;
  merge have summary ;
  by a ;
  if b=e ;
  diff = c_max - f_max ;
  product = diff * d ;
run;

Then you can use PROC SUMMARY again to find the sum within each group.

proc summary data=middle nway ;
  by a ;
  var product ;
  output out=want sum=new_var ;
run;

Totals

Obs      a     _TYPE_    _FREQ_     new_var
1     1001       0         6       67767.2
2     1002       0         5      -82624.5
Tom
  • 47,574
  • 2
  • 16
  • 29
1

Thank you for posting! I am assuming you want the max value when b = e, but I will give you two possible solutions just in case. Note that you only need to use proc sql once during any SQL calculation groups, but for clarity's sake we'll just do it twice.

Step 1: Get max values of c and f for when b = e

proc sql noprint;
    create table maxes as
        select b, e, 
               max(c) as max_c, max(f) as max_f
        from have
        where upcase(b) = upcase(e)
        group by b, e
    ;
quit;

Step 2: Get the max values of when b = e into the table, sum up all the cases by a where b = e

proc sql noprint;
    create table want as
        select a, sum(result) as result

        /* Get max values into the table. 
           Only interested in cases where a = b */
        from(select st1.a, st1.b, st1.e, 
                    max_c, max_f, 
                    (max_c - max_f)*st1.d as result
             from have as st1
             INNER JOIN
                  maxes as st2
             ON upcase(st1.b) = upcase(st2.b)
                AND upcase(st1.e) = upcase(st2.e)
             where upcase(st1.b) = upcase(st1.e) )
        group by a
    ;
quit;

Now, if you want to do this for the table maximum of c and f, you can do it with macro variables, the table sorted by a, by-group processing, and a Sum Statement:

Step 1: Read maximum values of c and f into macro variables

proc sql noprint;
    select max(c), max(f)
    into :max_c, :max_f
    from have;
quit;

Step 2: Sum the result for each a

data want;
    set have;
    by a;
    where upcase(b) = upcase(e);

    result+( (&max_c - &max_f)*d );

    if(last.a) then do;
        output;
        call missing(result); *Reset the sum for the next a group;
    end;

    keep a result;
run;

I hope this helps! This is just one way you could go about solving the problem, but there are many other great ways. It all depends upon your goals, environment, and programming style.

Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
0

So a basic data step or a proc call is already a looping over the input data. If you want to use a dataset as the source for parameter values for a macro call then a simple driver program is what you need.

Let's assume you have already defined a macro that performs some complex series of data steps and proc steps and takes one or more input parameters. Then you can create a dataset of parameter values and in a simple data step use CALL EXECUTE to generate calls to the macro passing the parameter values. So if the macro %MYMACRO() is defined with 5 positional parameters and you have the data set HAVE with five variables PARM1 to PARM5 and 10 observations then this data step will generate 10 macro calls.

 data _null_;
    set have;
    call execute(cats('%nrstr(%mymacro)(',catx(',',of parm1-parm5),')'));
 run;
Tom
  • 47,574
  • 2
  • 16
  • 29