4

I am a SAS beginner. I have an array-like piece of data in my code, which needs to be passed to a different data step much lower in the code to do computations with it. My code does something like this (computation simplified for this example):

data _null_;
    call symput('numRuns', 10000);
run;

/* this is the pre-computation step, building CompressArray for later use */
data _null_;
    do i = 1 to &numRuns;
        value = exp(rand('NORMAL', 0.1, 0.5)));
        call symput(compress('CompressArray'||i), value);
    end;
run;

data reportData;
    set veryLargeDataSet; /* 100K+ observations on 30+ vars */
    array outputValues[10000];

    do i = 1 to &numRuns;
        precomputedValue = symget(compress('CompressArray'||i));
        outputValues[i] = /* calculation using precomputedValue */
    end;
run;

I am trying to redo this using arrays, is that possible? E.g. to store it in some global array and access it later...

gt6989b
  • 4,125
  • 8
  • 46
  • 64
  • Is this the exact code you're trying to replicate or a simplified example? – Reeza Aug 05 '15 at 19:17
  • @Reeza the computation piece is about the same, but the last step is much more complex. – gt6989b Aug 05 '15 at 19:19
  • @gt6989b You should probably include a more complete example of your actual problem; what you describe above could be trivially done without any need for macro variable arrays. In particular, is the creation of the macro variable actually just `exp(i+1)`? In that case why not just incorporate that in the final step (`total = total + exp(i+1);`)? – Joe Aug 05 '15 at 19:38
  • As far as macro variable arrays, look at my answer [here](http://stackoverflow.com/questions/29598841/quote-array-element-inside-macro-do-loop/29599400#29599400); is that something like what you're trying to do? It's probably not the best way to do what you're doing, but it could work. – Joe Aug 05 '15 at 19:40
  • @Joe I updated the question code – gt6989b Aug 05 '15 at 19:45
  • @Reeza I updated the question to more realistic code, much closer to what is actually going on – gt6989b Aug 05 '15 at 19:46
  • I don't have time to code now, but I would assume that a format would be simpler and faster. – Reeza Aug 05 '15 at 19:50
  • But still - the value coming into precomputedvalue in your real code is just `exp(i)` or some calculation based on that? Or is it entirely unrelated (perhaps a value from a table or something)? – Joe Aug 05 '15 at 19:54
  • @Joe it is really `exp(rand('NORMAL', &meanNorm, &stDevNorm)));` -- i just want those to be the same for each `VeryLargeDataSet` record on a fixed path – gt6989b Aug 05 '15 at 19:57
  • Ah, okay. So you're doing basically a bunch of simulations. – Joe Aug 05 '15 at 19:58
  • @Joe exactly. updated the code with that... – gt6989b Aug 05 '15 at 19:59

2 Answers2

4

Arrays in SAS only exist for the duration of the data step in which they are created. You would need to save the contents of your array in a dataset or, as you have done, in a series of macro variables.

Alternatively, you might be able to rewrite some of your code to do all of the work that uses the array within one data step. DOW-loops are quite good in this regard.

Based on the updates to your question, it sounds as though you could use a temporary array to do what you want:

data reportData;
    set veryLargeDataSet; /* 100K+ observations on 30+ vars */
    array outputValues[&numruns];

    array precomputed[&numruns] _temporary_;
    if _n_ = 1 then do i = 1 to &numruns;
        if i = 1 then call streaminit(1);
        precomputed[i] = exp(rand('NORMAL', &meanNorm, &stDevNorm));
    end;

    do i = 1 to &numRuns;
        outputValues[i] = /* calculation using precomputed[i] */
    end;
run;

Defining an array as _temporary_ causes the values of the array elements to be retained across iterations of the data step, so you only have to populate it once and then you can use it for the rest of the data step.

user667489
  • 9,501
  • 2
  • 24
  • 35
  • My computational data step uses a data set of 100K+ records, running each of the 10K computations in my code for each of those records. How would I recycle the data in the same step without having pre-calc repeated for each record?? – gt6989b Aug 05 '15 at 19:24
  • He's describing a macro array, presumably, which while not a technical feature of the language, is something i'd call acceptable terminology at this point. – Joe Aug 05 '15 at 19:36
  • You can make two passes through the same dataset within the same data step, keeping your calculations from the first pass in an array and referring back to them during the second pass. Have a look at double dow-loops in [here](http://analytics.ncsu.edu/sesug/2010/BB13.Dorfman.pdf). – user667489 Aug 05 '15 at 19:38
  • @user667489 i don't think this is quite the problem. I updated my question code, it should now be much clearer. Thank you for trying to help – gt6989b Aug 05 '15 at 19:47
  • Won't both loops in your code get executed once for each record in `veryLArgeDataSet`? I need the precomputed values to be populated *once* and then used for each record – gt6989b Aug 05 '15 at 20:13
  • 2
    No, the first loop will only execute once, just after reading in the first row from `veryLargeDataset` - that's what the `if _n_ = 1` is for. – user667489 Aug 05 '15 at 20:14
  • thank you, i appreciate your help, this is very nice – gt6989b Aug 05 '15 at 20:18
  • 2
    If you don't need the values saved for future use, I think this is the way to go. This is nearly the same as my second example, except skipping the output of those values - so if you do want them, do it like in mine. – Joe Aug 05 '15 at 20:25
  • 3
    Don't forget to include a `call streaminit(some number)` of course inside your `if _n_=1` block before the `rand`, also, so you can reproduce your data :) – Joe Aug 05 '15 at 20:25
  • @Joe i thought you only call `streaminit()` once in the entire code, no? – gt6989b Aug 05 '15 at 20:30
  • shouldn't your call to `streaminit` be outside of the `do...end` block but under the `if`? – gt6989b Aug 05 '15 at 20:32
  • @gt6989b Once in the datastep, yes - hence in the `if _n_=1` block. You'd have to have two separate blocks (so `if _n_=1 then do; call streaminit...; do i = 1 to ...; ` – Joe Aug 05 '15 at 20:33
  • (sorry, don't know how to format this in comments) Wouldn't this be better: `if _n_ = 1 then do call streaminit(1); do i = 1 to &numruns; precomputed[i] = exp(...); end; end;` – gt6989b Aug 05 '15 at 20:48
  • @gt6989b That's roughly how I would do it, yes (with appropriate semicolons). The way above is fine also but reads slightly oddly. – Joe Aug 05 '15 at 20:52
  • @Joe the way above will cost you extra 10,000 comparisons :-) – gt6989b Aug 05 '15 at 20:57
  • I *think* SAS would optimize that out (since the `1` is a constant), but not 100% sure. – Joe Aug 05 '15 at 20:57
2

There are a lot of ways to do this, but the hash table lookup is one of the most straightforward.

%let meannorm=5;
%let stDevNorm=1;
%let numRuns=10000;

/* this is the pre-computation step, building CompressArray for later use */
data my_values;
    call streaminit(7);
    do i = 1 to &numRuns;
        Value= rand('Normal',&meannorm., &stDevNorm.);
        output;
    end;
run;


data reportData;
    if _n_=1 then do;
        declare hash h(dataset:'my_values');
        h.defineKey('i');  *the "key" you are looking up from;
        h.defineData('value'); *what you want back;
        h.defineDone();
        call missing(of i value);
    end;
    set sashelp.class; /* 100K+ observations on 30+ vars */
    array outputValues[10000];

    do i = 1 to &numRuns;
        rc=h.find();
        outputValues[i] = value;
    end;
run;

Basically, you need to 'load' the table in some fashion and do [something] with it. Here's one easy way.

In your particular example there's another pretty simple way: bring it in an as array.

In this case we don't put 10k rows out, but 10k variables - then declare it as an array (again) in the new data step. (Arrays are, as noted by user667489, transient; they're not stored on the dataset in any way, except as the underlying variables, so they have to be re-declared each data step.)

%let meannorm=5;
%let stDevNorm=1;
%let numRuns=10000;

/* this is the pre-computation step, building CompressArray for later use */
data my_values;
    call streaminit(7);
    array values[&numruns.];
    do i = 1 to &numRuns;
        Values[i]= rand('Normal',&meannorm., &stDevNorm.);
    end;

run;


data reportData;
    if _n_=1 then set my_values(drop=i);
    set sashelp.class; /* 100K+ observations on 30+ vars */
    array outputValues[&numruns.];
    array values[&numruns.];  *this comes from my_values;
    do i = 1 to &numRuns;
        outputValues[i] = values[i];
    end;
    drop values:;
run;

Here note that I have the set in if _n_=1 still - otherwise it would terminate the data step after the first iteration.

You could also use a format, as Reeza notes, or several other options - but I think these are the simplest.

Joe
  • 62,789
  • 6
  • 49
  • 67