0

I have results from A/B test that I need to evaluate but in the checking of the data I noticed that there were users that were in both control groups and I need to drop them to not hurt the test. My data looks something like this:

    transactionId   visitorId   date       revenue  group
0   906125958          0        2019-08-16  10.8     B
1   1832336629         1        2019-08-04  25.9     B
2   3698129301         2        2019-08-01  165.7    B
3   4214855558         2        2019-08-07  30.5     A
4   797272108          3        2019-08-23  100.4    A

What I need to do is remove every user that was in both A and B groups while leaving the rest intact. So from the example data I need this output:

    transactionId   visitorId   date       revenue  group
0   906125958          0        2019-08-16  10.8     B
1   1832336629         1        2019-08-04  25.9     B
4   797272108          3        2019-08-23  100.4    A

I tried to do it in various ways and I can't seems to figure it out and I couldn't find an answer for it anywhere I would really appreciate some help here, thanks in advance

Amir Deer
  • 3
  • 1
  • 1

1 Answers1

1

You can get a list of users that are in just one group like this:

group_counts = df.groupby('visitorId').agg({'group': 'nunique'}) ##list of users with number of groups
to_include = group_counts[group_counts['group'] == 1] ##filter for just users in 1 group

And then filter your original data according to which visitors are in that list:

df = df[df['visitorId'].isin(to_include.index)]
Jacob
  • 558
  • 2
  • 7
  • 1
    I did the group_counts code so many times and also did many variations of the to_include but wasn't sure how to proceed from there.. thank you so much I still have a lot to learn to be a good DA – Amir Deer Sep 23 '20 at 19:15