Deleting duplicates in groups A/B

Question

I'm conducting an A/B test and looking for an effective way to delete duplicate users ID's (visitorId column) that appear in both groups: the experiment and the control.

Here is an example:

visitorId	date	group
4256040402	2019-08-31	A
4256040402	2019-08-31	B
4256040402	2019-08-27	A
4256040402	2019-08-20	B

And the desired result:

visitorId	date	group
4256040402	2019-08-31	A
4256040402	2019-08-27	A
4256040402	2019-08-20	B

I'm looking for an efficient way that takes into account the date (date column) and deletes duplicates but on the condition that it takes place in both groups and on the same day.

score 0 · Answer 1 · answered Sep 21 '22 at 11:06

Try with drop_duplicates(subset=['visitorId', 'date'])

print(df)

    visitorId        date group
0  4256040402  2019-08-31     A
1  4256040402  2019-08-31     B
2  4256040402  2019-08-27     A
3  4256040402  2019-08-20     B

df = df.drop_duplicates(subset=['visitorId', 'date'])

print(df)

    visitorId        date group
0  4256040402  2019-08-31     A
2  4256040402  2019-08-27     A
3  4256040402  2019-08-20     B

Deleting duplicates in groups A/B

1 Answers1