0

I currently work in SAS and utilise arrays in this way:

Data Test;
input Payment2018-Payment2021;
datalines;

10 10 10 10
20 20 20 20
30 30 30 30
;
run;

In my opinion this automatically assumes a limit, either the start of the year or the end of the year (Correct me if i'm wrong please)

So, if I wanted to say that this is June data and payments are set to increase every 9 months by 50% I'm looking for a way for my code to recognise that my years go from end of June to the next end of june

For example, if I wanted to say

Data Payment_Pct;
set test;

lastpayrise = "31Jul2018";

array payment:
array Pay_Inc(2018:2021) Pay_Inc: ;

Pay_Inc2018 = 0;
Pay_Inc2019 = 2; /*2 because there are two increments in 2019*/
Pay_Inc2020 = 1;
Pay_Inc2021 = 1;

do I = 2018 to 2021;

    if i = year(pay_inc) then payrise(i) * 50% * Pay_Inc(i);

end;

run;

It's all well and good for me to manually do this for one entry but for my uni project, I'll need the algorithm to work these out for themselves and I am currently reading into intck but any help would be appreciated!

P.s. It would be great to have an algorithm that creates the following

Pay_Inc2019 Pay_Inc2020 Pay_Inc2021
1           2           1

OR, it would be great to know how the SAS works in setting the array for 2018:2021 , does it assume end of year or can you set it to mid year or?

78282219
  • 593
  • 5
  • 21
  • Variable names reflect nothing about the variable, unless you define it as such, so it's your definitions that matter.. If you want it to reflect a portion of the year, perhaps convert it to a monthly data set using PROC TIMESERIES or PROC EXPAND. – Reeza Sep 06 '18 at 16:23

2 Answers2

1

This is a wonderful use case of the intnx() function. intnx() will be your best friend when it comes to aligning dates.

In the traditional calendar, the year starts on 01JAN. In your calendar, the year starts in 01JUN. The difference between these two dates is exactly 6 months. We want to shift our date so that the year starts on 01JUN. This will allow you to take the year part of the date and determine what year you are on in the new calendar.

data want;
    format current_cal_year
           current_new_year year4.
    ;

    current_cal_year = intnx('year', '01JUN2018'd, 0, 'B');
    current_new_year = intnx('year.6', '01JUN2018'd, 1, 'B');
run;

Note that we shifted current_new_year by one year. To illustrate why, let's see what happens if we don't shift it by one year.

data want;
    format current_cal_year
           current_new_year year4.
    ;

    current_cal_year = intnx('year', '01JUN2018'd, 0, 'B');
    current_new_year = intnx('year.6', '01JUN2018'd, 0, 'B');
run;

current_new_year shows 2018, but we really are in 2019. For 5 months out of the year, this value will be correct. From June-December, the year value will be incorrect. By shifting it one year, we will always have the correct year associated with this date value. Look at it with different months of the year and you will see that the year part remains correct throughout time.

data want;
    format cal_month date9.
           cal_year
           new_year year4.
    ;

    do i = 0 to 24;
        cal_month = intnx('month', '01JAN2016'd, i, 'B');
        cal_year = intnx('year', cal_month, i, 'B');
        new_year = intnx('year.6', cal_month, i+1, 'B');
        year_not_same = (year(cal_year) NE year(new_year) );
        output;
    end;

    drop i;
run;
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
1

Regarding input Payment2018-Payment2021; there is no automatic assumption of yearness or calendaring. The numbers 2018 and 2021 are the bounds for a numbered range list

In a numbered range list, you can begin with any number and end with any number as long as you do not violate the rules for user-supplied names and the numbers are consecutive.

The meaning of the numbers 2018 to 2021 is up to the programmer. You state the variables correspond to the June payment in the numbered year.

You would have to iterate a date using 9-month steps and increment a counter based on the year in which the date falls.

Sample code

Dynamically adapts to the variable names that are arrayed.

data _null_;

  array payments payment2018-payment2021;
  array Pay_Incs pay_inc2018-pay_inc2021; * must be same range numbers as payments;

  * obtain variable names of first and last element in the payments array;
  lower_varname = vname(payments(1));
  upper_varname = vname(payments(dim(payments)));

  * determine position of the range name numbers in those variable names;
  lower_year_position = prxmatch('/\d+\s*$/', lower_varname);
  upper_year_position = prxmatch('/\d+\s*$/', upper_varname);

  * extract range name numbers from the variable names;
  lower_year = input(substr(lower_varname,lower_year_position),12.);
  upper_year = input(substr(upper_varname,upper_year_position),12.);

  * prepare iteration of a date over the years that should be the name range numbers;
  date = mdy(06,01,lower_year); * june 1 of year corresponding to first variable in array;

  format date yymmdd10.;

  do _n_ = 1 by 1; * repurpose _n_ for an infinite do loop with interior leave;
    * increment by 9-months;
    date = intnx('month', date, 9);

    year = year(date);    
    if year > upper_year then leave;

    * increment counter for year in which iterating date falls within;
    Pay_Incs( year - lower_year + 1 ) + 1;
  end;

  put Pay_Incs(*)=;
run;

Increment counter notes

There is a lot to unpack in this statement

    Pay_Incs( year - lower_year + 1 ) + 1;
  • + 1 at the end of the statement increments the addressed array element by 1, and is the syntax for the SUM Statement

    variable + expression

    The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here: retain variable 0; variable=sum(variable,expression);

  • year - lower_year + 1 computes the array base-1 index, 1..N, that addresses the corresponding variable in the named range list pay_inc<lower_year>-pay_inc<upper_year>

  • Pay_Incs( <computed index> ) selects the variable of the SUM statement

Richard
  • 25,390
  • 3
  • 25
  • 38