Group By Interpolation Based on the Previous Row

Question

The goal is to add a new row whenever there is a gap between the date variable between two rows grouped by id. If the gap occurs, then duplicate a row that is first. However only the date feature should not be as the first row rather it should be incremented by one day.

Also, everything needs to be grouped by id. I need to achieve it without expanding the function.

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

The desired result:

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

Richard · Accepted Answer · 2020-12-31T17:43:10.263

You can perform a 1:1 self merge with the second self starting at row 2 in order to provide a lead value. A 1:1 merge does not use a BY statement.

Example:

data have;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
      format date yymmdd10.;
    datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data want;
  * 1:1 merge without by statement;
  merge 
    have              /* start at row 1 */    
    have ( firstobs=2 /* start at row 2 for lead values */
           keep=id date            /* more data set options that prepare the lead */
           rename = ( id=nextid
                      date=nextdate
         )) 
  ;

  output;

  flag = '*';                  /* marker for filled in dates */
  if id = nextid then
    do date=date+1 to nextdate-1;
      output;
    end;

  drop next:;
run;

Result flagging filled in dates

score 1 · Answer 2 · answered Dec 31 '20 at 17:08

To "look ahead" you can re-read the same dataset starting from the second observation. SAS will stop when you read past the end of the input so add an extra empty observation.

data sample;
  input id date numeric_feature character_feature $;
  informat date yymmdd.;
  format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data want;
  set sample;
  by id;
  set sample(firstobs=2 keep=date rename=(date=next_date)) sample(obs=1 drop=_all_);
  output;
  if not last.id then do date=date+1 to next_date-1; output; end;
run;

Results:

                           numeric_    character_
Obs    id       date        feature     feature       next_date

  1     1    2020-01-01        5           A         2020-01-02
  2     1    2020-01-02        3           Z         2020-01-04
  3     1    2020-01-03        3           Z         2020-01-04
  4     1    2020-01-04        2           D         2020-01-05
  5     1    2020-01-05        7           B         2020-01-01
  6     2    2020-01-01        4           V         2020-01-03
  7     2    2020-01-02        4           V         2020-01-03
  8     2    2020-01-03        1           B         2020-01-05
  9     2    2020-01-04        1           B         2020-01-05
 10     2    2020-01-05        9           F                  .

Group By Interpolation Based on the Previous Row

2 Answers2