Select all the groups that meet condition at least once

Question

I want to keep all the rows of the groups that meet a condition at least once. In other words, I want to drop the groups that do not meet the condition at least once.

I've been looking around for a few hours and could not find a solution. This was the closest I got but I could not implement the answer.

I have the following data frame:

test = pd.DataFrame({"Gr":[1,1,2,2],"Bk":[9,1,8,5]})
print(test)

   Gr  Bk
0   1   9
1   1   1
2   2   8
3   2   5

I want to group by test["Gr"] and select all the groups where test["Bk"] == 9 at least once to get to this:

# Drop Gr 2 because they didn't meet Bk == 1 in any of its rows.
   Gr  Bk
0   1   9
1   1   1

I would have thought this could be easily achieved by combining groupby() and .any() without the need of lambda functions.

I tried this:

test.groupby("Gr").filter(lambda x: (x.Bk == 9).all())

score 3 · Answer 1 · answered Aug 19 '19 at 00:48

3

Easy to understand filter

test.groupby('Gr').filter(lambda x : x['Bk'].eq(9).any())
   Gr  Bk
0   1   9
1   1   1

answered Aug 19 '19 at 00:48

BENY

317,841
20
164
234

score 2 · Answer 2 · answered Aug 18 '19 at 23:49

2

Using GroupBy.transform with eq.any to check within each group if any of the values are equal (eq) to 9:

mask = test.groupby('Gr')['Bk'].transform(lambda x: x.eq(9).any())
test[mask]

Output

   Gr  Bk
0   1   9
1   1   1

answered Aug 18 '19 at 23:49

Erfan

40,971
8
66
78

This works perfectly, but why is the lambda part necessary? – Arturo Sbr Aug 19 '19 at 00:15
1

Since we use `transform` to get an equal length array back of `True` and `False` which indicates if the row should be returned in our result. Lambda is necessary since we apply multiple pandas methods (`eq` and `any`). – Erfan Aug 19 '19 at 00:31
Thanks man. One last question: are lambda functions executed row by row? – Arturo Sbr Aug 19 '19 at 13:19

score 2 · Accepted Answer · answered Aug 19 '19 at 00:12

2

There is a possiblity without groupby. Just check all entries for equality to 9, get the group of this entries and slice the whole dataframe for the extracted groups.

test[test.Gr.isin(test[test['Bk'].eq(9)].Gr)]

result

    Gr  Bk
0   1   9
1   1   1

answered Aug 19 '19 at 00:12

pythonic833

3,054
1
12
27

ansev · Answer 4 · 2019-08-19T00:16:29.917

1

You can do:

test =test[test['Gr'].apply(lambda x: x in [key for key in test['Gr'][test['Bk'].eq(9)]])]
test

Output:

    Gr  Bk
0   1   9
1   1   1

edited Aug 19 '19 at 00:16

answered Aug 18 '19 at 23:55

ansev

30,322
5
17
31

score 1 · Answer 5 · answered Aug 19 '19 at 02:03

pretty straightforward approach here..

checks where test['Bk'] is equal to 9, and gets the corresponding values in test['Gr'], then reduces the df down to only the rows where these values of test['Gr'] are occurring

test[test['Gr'].isin(test[test['Bk']==9]['Gr'])]

result :

   Gr  Bk
0   1   9
1   1   1

score 0 · Answer 6 · answered Aug 19 '19 at 01:12

Assume there is only single Bk=9 to each group of Gr . This is a crazy and overkilled way using merge :D

test.merge(test.loc[test.Bk.eq(9),'Gr'], on='Gr')

Out[227]:
   Gr  Bk
0  1   9
1  1   1

Note: it still works on multiple 9 per group. It just needs drop_duplicates, but I think at that point it gets too complicated and not fun anymore

Select all the groups that meet condition at least once

6 Answers6

Linked

Related