1

Is it possible to remove duplicated records in sequence inside a specific group and output only last of them (based od date) with 4GL (SAS)? I have data like:

data example;
input obs id dt value WANT_TO_SELECT;
cards;
1 10 1 500 0
2 10 2 750 1
3 10 3 750 1
4 10 4 750 0
5 10 5 500 0
6 20 1 150 1
7 20 2 150 0
8 20 3 370 0
9 20 4 150 0
;
run;

As You see for id=10 I would like to have only one (last) record with value 750, because there is one after the other while value 500 can be twice because they are separated. I was trying use last/first but I am not sure how to sort the data.

Dreamer
  • 13
  • 3

1 Answers1

2

Looks like a use case for the NOTSORTED keyword of the BY statement. This will let you use VALUE as a BY variable even though the data is not actually sorted by VALUE. That way the LAST.VALUE flag can be used.

data want;
  set example;
  by id value notsorted;
  if last.value;
run;

Results:

                                   WANT_TO_
Obs    obs    id    dt    value     SELECT

 1      1     10     1     500         0
 2      4     10     4     750         0
 3      5     10     5     500         0
 4      7     20     2     150         0
 5      8     20     3     370         0
 6      9     20     4     150         0
Tom
  • 47,574
  • 2
  • 16
  • 29
  • Whoouuu, great! I didn't know that option. You' re my hero. Thank You. – Dreamer Jul 12 '22 at 17:47
  • @Dreamer The subsetting expression `if FIRST.value;` would also work for this scenario. The core concept is that you only want a single row from a group of rows having a contiguous value. – Richard Jul 12 '22 at 19:44