How can you create a SAS data set from another dataset using only the last n observations from original dataset. This is easy when you know the value of n. If I don't know 'n' how can this be done?
-
2What do you mean you don't know 'n'? How are you going to find out the value of 'n' - is it a dataset variable, a macro variable, a parameter? – Joe Apr 23 '13 at 13:58
5 Answers
This assumes you have a macro variable that says how many observations you want. NOBS tells you the number of observations in the dataset currently without reading the whole thing.
%let obswant=5;
data want;
set sashelp.class nobs=obscount;
if _n_ gt (obscount-&obswant.);
run;

- 62,789
- 6
- 49
- 67
Using Joe's example of a macro variable to specify the number of observations you want, here is another answer:
%let obswant = 10;
data want;
do _i_=nobs-(&obswant-1) to nobs;
set have point=_i_ nobs=nobs;
output;
end;
stop; /* Needed to stop data step */
run;
This should perform better since it only reads the specific observations you want.

- 9,498
- 5
- 29
- 56
-
I'm curious about the conditions for "this should perform better". I suspect it's some of each; random access is not as fast as sequential access, so there is some loss there. May test it out if I have some time. I suspect if you are using most of the observations that it will be faster to do it sequentially, but if it is a large dataset and you want a small number of observations it is faster to do it random=access. Makes me wish SAS had a way to open a dataset in reverse order without having to sort it first. – Joe Apr 23 '13 at 14:52
-
3@Joe Imagine a dataset with 1 million obs and you want the "last" 10. Using this technique, only 10 "input" operations are performed. Using just the NOBS value requires 1 million "input" operations. Depending on how "wide" the data set is, this difference can be very significant. – BellevueBob Apr 23 '13 at 14:59
-
1Right, I certainly see the possibility for this being faster. I am curious at which point it is faster/slower (ie, what size of dataset, what % of data being pulled - 10 from 1MM of course would be faster, but what about 500k from 1MM?) – Joe Apr 23 '13 at 16:16
If the dataset is large, you might not want to read the whole dataset. Instead you could try a construction that reads the total number of Observations in the dataset first. So if you want to have the last of observations:
data t;
input x;
datalines;
1
2
3
4
;
%let dsid=%sysfunc(open(t));
%let num=%sysfunc(attrn(&dsid,nlobs));
%let rc=%sysfunc(close(&dsid));
%let number = 2;
data tt;
set t (firstobs = %eval(&num.-&number.+1));
run;

- 671
- 5
- 16
For the sake of variety, here's another approach (not necessarily a better one)
%let obswant=5;
proc sql noprint;
select nlobs-&obswant.+1 into :obscalc
from dictionary.tables
where libname='SASHELP' and upcase(memname)='CLASS';
quit;
data want;
set sashelp.class (firstobs=&obscalc.);
run;

- 7,582
- 13
- 19
-
1Note: you want to specify format when using select..into, otherwise the value is formatted as BEST8., which leads to weird bugs when you select a value >= 100,000,000 (it's formatted in exponential notation, leading to rounding). – Nickolay May 01 '13 at 03:59
You can achive this using the
_nobs_
and _n_
variables. First, create a temporary variable to store the total no of obs. Then compare the automatic variable N to nobs.
data a;
set sashelp.class nobs=_nobs_;
if _N_ gt _nobs_ -5;
run;

- 171
- 1
- 2
- 14