7

How can you create a SAS data set from another dataset using only the last n observations from original dataset. This is easy when you know the value of n. If I don't know 'n' how can this be done?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Riyaz Iqbal
  • 71
  • 1
  • 1
  • 3
  • 2
    What do you mean you don't know 'n'? How are you going to find out the value of 'n' - is it a dataset variable, a macro variable, a parameter? – Joe Apr 23 '13 at 13:58

5 Answers5

8

This assumes you have a macro variable that says how many observations you want. NOBS tells you the number of observations in the dataset currently without reading the whole thing.

%let obswant=5;
data want;
set sashelp.class nobs=obscount;
if _n_ gt (obscount-&obswant.);
run;
Joe
  • 62,789
  • 6
  • 49
  • 67
4

Using Joe's example of a macro variable to specify the number of observations you want, here is another answer:

%let obswant = 10;
data want;
   do _i_=nobs-(&obswant-1) to nobs;
      set have point=_i_ nobs=nobs;
      output;
      end;
   stop;  /* Needed to stop data step */
run;

This should perform better since it only reads the specific observations you want.

BellevueBob
  • 9,498
  • 5
  • 29
  • 56
  • I'm curious about the conditions for "this should perform better". I suspect it's some of each; random access is not as fast as sequential access, so there is some loss there. May test it out if I have some time. I suspect if you are using most of the observations that it will be faster to do it sequentially, but if it is a large dataset and you want a small number of observations it is faster to do it random=access. Makes me wish SAS had a way to open a dataset in reverse order without having to sort it first. – Joe Apr 23 '13 at 14:52
  • 3
    @Joe Imagine a dataset with 1 million obs and you want the "last" 10. Using this technique, only 10 "input" operations are performed. Using just the NOBS value requires 1 million "input" operations. Depending on how "wide" the data set is, this difference can be very significant. – BellevueBob Apr 23 '13 at 14:59
  • 1
    Right, I certainly see the possibility for this being faster. I am curious at which point it is faster/slower (ie, what size of dataset, what % of data being pulled - 10 from 1MM of course would be faster, but what about 500k from 1MM?) – Joe Apr 23 '13 at 16:16
4

If the dataset is large, you might not want to read the whole dataset. Instead you could try a construction that reads the total number of Observations in the dataset first. So if you want to have the last of observations:

data t;
  input x;
datalines;
1
2
3
4
;

%let dsid=%sysfunc(open(t));
%let num=%sysfunc(attrn(&dsid,nlobs));
%let rc=%sysfunc(close(&dsid));
%let number = 2;

data tt;
set t (firstobs =  %eval(&num.-&number.+1));
run;
user1965813
  • 671
  • 5
  • 16
2

For the sake of variety, here's another approach (not necessarily a better one)

%let obswant=5;

proc sql noprint;
select nlobs-&obswant.+1 into :obscalc
from dictionary.tables
where libname='SASHELP' and upcase(memname)='CLASS';
quit;

data want;
set sashelp.class (firstobs=&obscalc.);
run;
Longfish
  • 7,582
  • 13
  • 19
  • 1
    Note: you want to specify format when using select..into, otherwise the value is formatted as BEST8., which leads to weird bugs when you select a value >= 100,000,000 (it's formatted in exponential notation, leading to rounding). – Nickolay May 01 '13 at 03:59
0

You can achive this using the _nobs_ and _n_ variables. First, create a temporary variable to store the total no of obs. Then compare the automatic variable N to nobs.

data a;
set sashelp.class nobs=_nobs_;
if _N_ gt _nobs_ -5;
run;
subhro
  • 171
  • 1
  • 2
  • 14