I'm conducting an A/B test and looking for an effective way to delete duplicate users ID's (visitorId column) that appear in both groups: the experiment and the control.
Here is an example:
visitorId | date | group |
---|---|---|
4256040402 | 2019-08-31 | A |
4256040402 | 2019-08-31 | B |
4256040402 | 2019-08-27 | A |
4256040402 | 2019-08-20 | B |
And the desired result:
visitorId | date | group |
---|---|---|
4256040402 | 2019-08-31 | A |
4256040402 | 2019-08-27 | A |
4256040402 | 2019-08-20 | B |
I'm looking for an efficient way that takes into account the date (date column) and deletes duplicates but on the condition that it takes place in both groups and on the same day.