I'm working with SAS and I have a data frame like this:
Table1:
+------+------------+-----------+--------+
| name | date | time | price |
+------+------------+-----------+--------+
| A | 7-May-08 | 11:12:41 | 1 |
| A | 11-Jul-08 | 11:23:41 | 2 |
| A | 3-Jan-09 | 11:31:41 | 1 |
| A | 4-Jan-09 | 11:32:41 | 2 |
| A | 4-Jan-09 | 11:32:41 | 2 |
| A | 8-Jul-09 | 11:32:41 | 1 |
| A | 8-Jul-09 | 11:32:41 | 2 |
| A | 24-Jul-09 | 11:32:41 | 3 |
| A | 24-Jul-09 | 11:32:41 | 4 |
| A | 8-Dec-09 | 12:32:41 | 1 |
| B | 7-May-08 | 11:31:41 | 2 |
| B | 10-May-08 | 11:32:41 | 3 |
| B | 17-May-08 | 11:33:41 | 4 |
| B | 24-May-08 | 11:34:41 | 1 |
| B | 1-Jun-08 | 11:35:41 | 5 |
| B | 18-Jun-08 | 11:36:41 | 1 |
| B | 9-May-09 | 11:37:41 | 3 |
| C | 7-Oct-09 | 11:21:41 | 3 |
| C | 17-Oct-09 | 11:22:41 | 2 |
| C | 25-Oct-09 | 11:32:41 | 1 |
| C | 18-Nov-09 | 11:33:41 | 3 |
| C | 4-Dec-09 | 11:12:41 | 4 |
| C | 19-Dec-09 | 10:22:41 | 1 |
| C | 9-May-10 | 11:42:41 | 3 |
| C | 9-May-10 | 11:12:41 | 1 |
| C | 10-May-10 | 12:52:41 | 2 |
+------+------------+-----------+--------+
I have another data frame like this:
table2:
+------+-----------+
| name | date |
+------+-----------+
| A | 11-Jul-08 |
| A | 3-Jan-09 |
| A | 24-Jul-09 |
| B | 7-May-08 |
| B | 17-May-08 |
| B | 18-Jun-08 |
| B | 9-Jul-09 |
| C | 17-Oct-09 |
| C | 4-Dec-09 |
| C | 19-Dec-09 |
+------+-----------+
Now I want to do two main operations:
1- If date and name in table2 is in table1 then delete that row in table1;
2- If the previous step happened then delete the next row of that name and date and if name and date of the next row is repeated for other next rows then delete all of them.
For example table1 at last should be like this:
+------+-----------+----------+-------+
| name | date | time | price |
+------+-----------+----------+-------+
| A | 7-May-08 | 11:12:41 | 1 |
| A | 8-Jul-09 | 11:32:41 | 1 |
| A | 8-Jul-09 | 11:32:41 | 1 |
| B | 1-Jun-08 | 11:35:41 | 5 |
| C | 7-Oct-09 | 11:21:41 | 3 |
| C | 18-Nov-09 | 11:33:41 | 3 |
| C | 10-May-10 | 12:52:41 | 2 |
+------+-----------+----------+-------+
Here is a code which is not perfect for this operation because of two reasons:
1- Using the nodupkey option delete all duplicate observations from table1 which is not necessary. Because delete them is happened when conditions, which is described above, is satisfied.
2- And the "(inb=0 and lag(inb)=1 and not first.name)" statement delete just one next rows and the other next rows whit same name and date still be in table1.
proc sort data=table1 out=tablea1 nodupkey;
by name date;
run;
proc sort data=table2 out=tableb1 nodupkey;
by name date;
run;
data want;
merge tablea1 tableb1(in=inb) ;
by name date;
if inb or (inb=0 and lag(inb)=1 and not first.name) then delete;
run;
Thanks in advance.