2

I want to aggregate a pandas df column by 2 criteria.

1) First I want to groupby col1

2) I want the mean of col3 only if col2 has minimum one row of matching 0 AND minimum one row what matches 1.

I tried some combination of this but this does not work of course:

df.groupby(['col1'])['col2'].isin([0 & 1]).col3.mean()

So for example if this is my df:

df = pd.DataFrame({
'col1' : ['a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'],
'col2' : [ 0, 0, 1, 0, 1, 1, 1, 0, 1, 0],
'col3' : [ 3, 4, 2, 4, 1, 2, 3, 5, 2, 1]
})

>>>
    col1    col2    col3
0    a       0        3
1    a       0        4
2    b       1        2
3    b       0        4
4    c       1        1
5    c       1        2
6    c       1        3
7    d       0        5
8    d       1        2
9    d       0        1

What I want to see is:

col1    col2    mean(col3)
b        1        2
b        0        4
d        0        3   
d        1        2
Geveze
  • 393
  • 3
  • 12

1 Answers1

2

Your main question is one of filtering. There are a few ways to do this but one way is to compute the nunique count per group.

u = df[df.groupby('col1').col2.transform('nunique').gt(1)]

  col1  col2  col3
2    b     1     2
3    b     0     4
7    d     0     5
8    d     1     2
9    d     0     1

If col2 has 0s and 1s in a group, the nunique count will be greater than 1.

Now, call groupby as usual.

u.groupby(['col1', 'col2'], as_index=False).col3.mean()

  col1  col2  col3
0    b     0     4
1    b     1     2
2    d     0     3
3    d     1     2
cs95
  • 379,657
  • 97
  • 704
  • 746
  • what if col2 would be arbitrary 2 values like 34 an 77 or string like ‘y’ and ‘p’? – Geveze Feb 08 '19 at 23:42
  • @Geveze I've already provided a solution for that, but I have edited my answer so it is clear. Thanks. – cs95 Feb 08 '19 at 23:43