1

I'm trying to understand a friend's code to see if I can find some inspiration for my dissertation. He runs a section where he creates a dataset and inputs 3 datasets. However, what I don't understand is that he uses 3 set statements and the latter datasets use point = "_ N _"

What is the use of the following code?

        data Other;
            set One;
            set Two point = _N_;
            set Three point = _N_;
            array Rating[*] Unrated;
            array Amortising[*] '1'n;
            array Rating_old[*] old_Unrated;
            AM = 0;
            do i = 1 to dim(Rating);
                Rating[i] = Rating[i] + Rating_old[i] * Amortising[i];
            end;
        run;

The input datasets look like this

data one;
input Segment count weight ;
datalines;
1 0 0.1
99 1 0.2
;
run;

data two;
input block $ type '0'n '1'n '99'n;
datalines;
50 A 100% 10% 0%
50 S 100% 10% 0%
51 S 100% 10% 0%
52 S 100% 10% 0%
132 S 100% 12% 0%
;
run;

data three;
input DPD $ block type $ segment count weight;
datalines;
AM 50 S 1 0 0.1
Unrated 51 S 99 0.2
NPE 132 S 1 0.5
;
run;

Just looking to see what the point = _ N _ would be used for!

78282219
  • 593
  • 5
  • 21
  • In this program it does nothing. The program would run exactly the same without the point= option on the last two set statements. – Tom Aug 20 '18 at 18:39

2 Answers2

1

In this program it does nothing. The program would run exactly the same without the point= option on the last two set statements.

The POINT= let's you access observations directly. The _N_ automatic variable is incremented once for each iteration of the data step. So on the first iteration the step will read the first observation from each of the three inputs. Which is exactly what would happen without the point= option.

Note that this program will stop when the first SET statement reads past the end of the file. Without the POINT= then it would stop when ANY of the three set statements attempted to read past the end of the input file. You could do the same and avoid the ERRORs in the SAS log by using and testing the NOBS= options.

set One;
if _n_ <= nobs2 then set Two nobs=nobs2;
if _n_ <= nobs3 then set Three nobs=nobs3;
Tom
  • 47,574
  • 2
  • 16
  • 29
  • I simplified the datasets too much then, I did not yet understand the purpose of the point statement and now it is clear, thank you. – 78282219 Aug 21 '18 at 05:24
1

Given the datasets shown, it doesn't do anything.

However, if the ONE dataset had more rows than one or both of the other two datasets, it would avoid the data step stopping when it ran out of rows from the shortest dataset. For example, run this:

  data Other;
        set Two;
        set One point = _N_;
        set Three point = _N_;
        array Rating[*] Unrated;
        array Amortising[*] '1'n;
        array Rating_old[*] old_Unrated;
        AM = 0;
        do i = 1 to dim(Rating);
            Rating[i] = Rating[i] + Rating_old[i] * Amortising[i];
        end;
    run;

Just swapping TWO and ONE. Now you get 5 rows, while if you took off the point=_n_, you'd only get two still. So the program is likely being written to ensure all of ONE's rows are represented (similar to a left join in SQL except you're not joining to anything here). This would probably be more clearly written as a merge, even without a by statement if it's just a one-to-one merge. Usually, though, there's a valid merge key to merge on.

Joe
  • 62,789
  • 6
  • 49
  • 67