0

I'm looking to add a sequence column to my sas dataset, but according to ids and transaction dates. To illustrate, below is the table I'm referring to:

ID   |   TXN_DT    |    
01   |  01JAN2020  |    
01   |  01JAN2020  |    
01   |  02JAN2020  |    
01   |  03JAN2020  |    
02   |  01JAN2020  |    
02   |  02JAN2020  |    
02   |  02JAN2020  |    
02   |  03JAN2020  |    
02   |  03JAN2020  |  

and I want to add a sequence like so:

ID   |   TXN_DT    |   SEQ  | 
01   |  01JAN2020  |    1   |
01   |  01JAN2020  |    1   |
01   |  02JAN2020  |    2   |
01   |  03JAN2020  |    3   |
02   |  01JAN2020  |    1   |
02   |  02JAN2020  |    2   |
02   |  02JAN2020  |    2   |
02   |  03JAN2020  |    3   |
02   |  03JAN2020  |    3   |

I'm trying to run the following code, but it seems to jump a row up and not copying the previous' row's value, and instead skips to 2 rows above.

data want;
set have;
by id;
if first.id then seq=1;
else seq+1;
if txn_dt=lag(txn_dt) then seq = lag(seq);
keep id seq txn_dt;
run;

any help? Thanks in advance!

Lasmyr
  • 39
  • 4
  • I would probably do it slightly differently and only increment `seq` when `txn_dt ^= lag(txn_dt)` – mjsqu Feb 14 '20 at 01:58

4 Answers4

1

Try

if first.id then seq=0;
seq + (first.id or txn_dt ne lag(txn_dt);
Richard
  • 25,390
  • 3
  • 25
  • 38
0

Try to use retain and first.

data want(drop=txn_dt_group);
    set have;
    by id txn_dt;
    retain txn_dt_group seq;
    if first.id then do;
        txn_dt_group=txn_dt;
        seq=1;
    end;
    if txn_dt ne txn_dt_group then do;
        seq=seq+1;
        txn_dt_group=txn_dt;
    end;    
run;

Output:

+-----------+----+-----+
|  txn_dt   | ID | seq |
+-----------+----+-----+
| 01JAN2020 |  1 |   1 |
| 01JAN2020 |  1 |   1 |
| 02JAN2020 |  1 |   2 |
| 03JAN2020 |  1 |   3 |
| 01JAN2020 |  2 |   1 |
| 02JAN2020 |  2 |   2 |
| 02JAN2020 |  2 |   2 |
| 03JAN2020 |  2 |   3 |
| 03JAN2020 |  2 |   3 |
+-----------+----+-----+
Llex
  • 1,770
  • 1
  • 12
  • 27
0
data want;
set have;
by id txn_dt;
if first.id then seq=1;
else if first.txn_dt then seq+1;
run;

I think that should do it.

Jonathan Wilson
  • 626
  • 6
  • 14
0

For completeness, here is a hash solution that does not depend on the order of your data.

data have;
input ID $ TXN_DT :date9.;
infile datalines dlm='|';
format TXN_DT date9.;
datalines;
01|01JAN2020
01|01JAN2020
01|02JAN2020
01|03JAN2020
02|01JAN2020
02|02JAN2020
02|02JAN2020
02|03JAN2020
02|03JAN2020
;

data want(drop=rc);
   if _N_ = 1 then do;
      dcl hash h1 ();
      h1.definekey ('ID', 'TXN_DT');
      h1.definedata ('SEQ');
      h1.definedone ();
      dcl hash h2 ();
      h2.definekey ('ID');
      h2.definedata ('SEQ');
      h2.definedone ();

      do until (lr);
         set have end=lr;

         if h2.find() = 0 then do;
            if h1.check() ne 0 then seq + 1;
         end;
         else seq = 1;

         h1.ref();
         h2.replace();
      end;
   end;

   set have;
   rc = h1.find();
run;
PeterClemmensen
  • 4,018
  • 3
  • 14
  • 22