0

I have to create complex (for me) counter variables in a dataset. I am trying to explain as clearly as possible. If anything unclear please let me know. Hope with your help I can achieve what I am expecting.

I need to create three variables: Probation_Count, Probation_Flag and Cure_Count.

Creating three variables are CID specific (we are grouping by CID).

Probation_Count and Probation_Flag conditions

  • Condition 1 - probation_count starts from 1 when a contract goes from Default_Flag =Y to Default_Flag = N, and probation_flag = Y.
  • Condition 2 - Probation_count will increment as long as DPD = 0 and Default_Flag =N, probation flag =Y

enter image description here

  • Condition 3 - when DPD >0 and DPD <= 3 and Defult_Flag=N, probation_count has to stay at the value when DPD = 0; probation_count will start to increase once DPD =0 and default_flag = N, probation_flag =Y

enter image description here

  • Condition 4 - when DPD >3 and default_flag = N then Probation count resets to 0 unti DPD = 0 and default_Flag=N, probation flag = Y

enter image description here

  • Condition 5 - probation_count can increase until 10, and then resets to 0, probation_flag = Y until probation count =10
  • Condition 6 - When ever Default_Flag = Y then probation_count = 0 and Probation_flag = N. In order to start the probation_count contract has to move from default_flag= Y to Default_flag=N.

Cure_count conditions

  • Condition 1 - cure_count starts from 1 when previous date probation_count was 10 and current date default_flag = N
  • Condition 2 - Cure_count will increase until default flag = Y or Cure_count = 10

Please find the sample data below.

I have manually calculated probation_count, probation_flag and cure_count.

    data sample;
INFILE DATALINES DLM='#';
input CID date ddmmyy10. DPD Default_Flag $ Probation_Count probation_Flag $ Cure_count;
format date ddmmyy10.;
datalines;
111#04/04/2021#87#N#00# #0
111#05/04/2021#88#N#00# #0
111#06/04/2021#89#N#00# #0
111#07/04/2021#90#Y#00# #0
111#08/04/2021#91#Y#00# #0
111#09/04/2021#92#Y#00# #0
111#10/04/2021#93#Y#00# #0
111#11/04/2021#00#N#01#Y#0
111#12/04/2021#00#N#02#Y#0
111#13/04/2021#00#N#03#Y#0
111#14/04/2021#00#N#04#Y#0
111#15/04/2021#00#N#05#Y#0
111#16/04/2021#01#N#05#Y#0
111#17/04/2021#02#N#05#Y#0
111#18/04/2021#00#N#06#Y#0
111#19/04/2021#00#N#07#Y#0
111#20/04/2021#00#N#08#Y#0
111#21/04/2021#00#N#09#Y#0
111#22/04/2021#00#N#10#Y#0
111#23/04/2021#00#N#00# #1
111#24/04/2021#00#N#00# #2
111#25/04/2021#00#N#00# #3
222#04/04/2021#86#N#00# #0
222#05/04/2021#87#N#00# #0
222#06/04/2021#88#N#00# #0
222#07/04/2021#89#N#00# #0
222#08/04/2021#90#Y#00# #0
222#09/04/2021#91#Y#00# #0
222#10/04/2021#92#Y#00# #0
222#11/04/2021#93#Y#00# #0
222#12/04/2021#94#Y#00# #0
222#13/04/2021#95#Y#00# #0
222#14/04/2021#96#Y#00# #0
333#04/04/2021#87#N#00# #0
333#05/04/2021#88#N#00# #0
333#06/04/2021#89#N#00# #0
333#07/04/2021#90#Y#00# #0
333#08/04/2021#91#Y#00# #0
333#09/04/2021#92#Y#00# #0
333#10/04/2021#00#N#01#Y#0
333#11/04/2021#00#N#02#Y#0
333#12/04/2021#00#N#03#Y#0
333#13/04/2021#00#N#04#Y#0
333#14/04/2021#00#N#05#Y#0
333#15/04/2021#00#N#06#Y#0
333#16/04/2021#01#N#05#Y#0
333#17/04/2021#02#N#05#Y#0
333#18/04/2021#03#N#05#Y#0
333#19/04/2021#04#N#00#Y#0
333#20/04/2021#05#N#00#Y#0
333#21/04/2021#00#N#01#Y#0
333#22/04/2021#00#N#02#Y#0
333#23/04/2021#00#N#03#Y#0
333#24/04/2021#00#N#04#Y#0
333#25/04/2021#00#N#05#Y#0
333#26/04/2021#00#N#06#Y#0
333#27/04/2021#00#N#07#Y#0
333#28/04/2021#00#N#08#Y#0
333#29/04/2021#00#N#09#Y#0
333#30/04/2021#00#N#10#Y#0
333#01/05/2021#00#N#00# #1
333#02/05/2021#00#N#00# #2
333#03/05/2021#00#N#00# #3
333#04/05/2021#90#Y#00# #0
333#05/05/2021#91#Y#00# #0
;
run;

Thank you so much for your time and help

Myurathan Kajendran
  • 647
  • 3
  • 8
  • 15
  • In the condition 4 picture, why does probation_count decrease from `6` to `5` ? Are your rules 'overwriting' values in existing columns, or are you computing `Probation_Count`, `Probation_Flag` and `Cure_Count` directly from state changes and values of only `dpd` and `default_flag` ? – Richard Sep 17 '20 at 06:20

1 Answers1

0

The data and explanations are not 100% clear, but this sample code might help you fully realize the complex rules you are attempting.

I need to create three variables: Probation_Count, Probation_Flag and Cure_Count.

I would expect this to mean these variables and their values can only be computed from the state and changed state of default_flag and dpd. You don't make it clear how or if a value computed in the prior row should be carried forward into the next rows computation.

Example:

data have;
INFILE DATALINES DLM='#';
input CID date ddmmyy10. DPD Default_Flag $ Probation_Count_X Probation_Flag_X $ Cure_Count_X;
format date ddmmyy10.;
datalines;
111#04/04/2021#87#N#00# #0
111#05/04/2021#88#N#00# #0
111#06/04/2021#89#N#00# #0
111#07/04/2021#90#Y#00# #0
111#08/04/2021#91#Y#00# #0
111#09/04/2021#92#Y#00# #0
111#10/04/2021#93#Y#00# #0
111#11/04/2021#00#N#01#Y#0
111#12/04/2021#00#N#02#Y#0
111#13/04/2021#00#N#03#Y#0
111#14/04/2021#00#N#04#Y#0
111#15/04/2021#00#N#05#Y#0
111#16/04/2021#01#N#05#Y#0
111#17/04/2021#02#N#05#Y#0
111#18/04/2021#00#N#06#Y#0
111#19/04/2021#00#N#07#Y#0
111#20/04/2021#00#N#08#Y#0
111#21/04/2021#00#N#09#Y#0
111#22/04/2021#00#N#10#Y#0
111#23/04/2021#00#N#00# #1
111#24/04/2021#00#N#00# #2
111#25/04/2021#00#N#00# #3
222#04/04/2021#86#N#00# #0
222#05/04/2021#87#N#00# #0
222#06/04/2021#88#N#00# #0
222#07/04/2021#89#N#00# #0
222#08/04/2021#90#Y#00# #0
222#09/04/2021#91#Y#00# #0
222#10/04/2021#92#Y#00# #0
222#11/04/2021#93#Y#00# #0
222#12/04/2021#94#Y#00# #0
222#13/04/2021#95#Y#00# #0
222#14/04/2021#96#Y#00# #0
333#04/04/2021#87#N#00# #0
333#05/04/2021#88#N#00# #0
333#06/04/2021#89#N#00# #0
333#07/04/2021#90#Y#00# #0
333#08/04/2021#91#Y#00# #0
333#09/04/2021#92#Y#00# #0
333#10/04/2021#00#N#01#Y#0
333#11/04/2021#00#N#02#Y#0
333#12/04/2021#00#N#03#Y#0
333#13/04/2021#00#N#04#Y#0
333#14/04/2021#00#N#05#Y#0
333#15/04/2021#00#N#06#Y#0
333#16/04/2021#01#N#05#Y#0
333#17/04/2021#02#N#05#Y#0
333#18/04/2021#03#N#05#Y#0
333#19/04/2021#04#N#00#Y#0
333#20/04/2021#05#N#00#Y#0
333#21/04/2021#00#N#01#Y#0
333#22/04/2021#00#N#02#Y#0
333#23/04/2021#00#N#03#Y#0
333#24/04/2021#00#N#04#Y#0
333#25/04/2021#00#N#05#Y#0
333#26/04/2021#00#N#06#Y#0
333#27/04/2021#00#N#07#Y#0
333#28/04/2021#00#N#08#Y#0
333#29/04/2021#00#N#09#Y#0
333#30/04/2021#00#N#10#Y#0
333#01/05/2021#00#N#00# #1
333#02/05/2021#00#N#00# #2
333#03/05/2021#00#N#00# #3
333#04/05/2021#90#Y#00# #0
333#05/05/2021#91#Y#00# #0
;

data want;
  length rule $1 probation_count 8 probation_flag $1 cure_count 8;
  length trigger_counting pcounting 8;

  retain pcounting probation_count;

  set have;
  by cid;

  rule = ' ';

  if first.cid then do;
    probation_count = 0;
    probation_flag = ' ';
    trigger_counting = 0;
    pcounting = 0;
  end;

  trigger_counting = 
    default_flag = 'N'
    and
    ( lag(default_flag) = 'Y' and NOT first.cid )
  ;

  if default_flag = 'N' then do;

    * set the counting flag 'pcounting' and initialize count;

    if trigger_counting then do;
      pcounting = 1;
      probation_count = 1;
      probation_flag = 'Y';
      rule = '1';
      return;
    end;

    * increment count for no dpd, reset if necessary;

    if pcounting and dpd = 0 then do;
      probation_count + 1;
      probation_flag = 'Y';
      rule = '2';

      if probation_count > 10 then do;
        probation_count = 0;
        rule = '5';
      end;

      return;
    end;

    * pause counting for few dpd;

    if pcounting and 0 < dpd <= 3 then do;
      probation_flag = 'Y';
      rule = '3';
      return;
    end;

    * reset counting for high dpd;

    if pcounting and dpd > 3 then do;
      probation_count = 0;
      probation_flag = 'Y';
      rule = '4';
      return;
    end;
  end;
  else
  if default_flag = 'Y' then do;
    probation_count = 0;
    probation_flag = 'N';
    rule = '6';
  end;
  else do;
    put 'ERROR: ' default_flag= _n_=;
    stop;
  end;

*  drop trigger_counting pcounting;
run;
Richard
  • 25,390
  • 3
  • 25
  • 38