4

I want to keep all the rows of the groups that meet a condition at least once. In other words, I want to drop the groups that do not meet the condition at least once.

I've been looking around for a few hours and could not find a solution. This was the closest I got but I could not implement the answer.

I have the following data frame:

test = pd.DataFrame({"Gr":[1,1,2,2],"Bk":[9,1,8,5]})
print(test)

   Gr  Bk
0   1   9
1   1   1
2   2   8
3   2   5

I want to group by test["Gr"] and select all the groups where test["Bk"] == 9 at least once to get to this:

# Drop Gr 2 because they didn't meet Bk == 1 in any of its rows.
   Gr  Bk
0   1   9
1   1   1

I would have thought this could be easily achieved by combining groupby() and .any() without the need of lambda functions.

I tried this:

test.groupby("Gr").filter(lambda x: (x.Bk == 9).all())
Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76

6 Answers6

3

Easy to understand filter

test.groupby('Gr').filter(lambda x : x['Bk'].eq(9).any())
   Gr  Bk
0   1   9
1   1   1
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Using GroupBy.transform with eq.any to check within each group if any of the values are equal (eq) to 9:

mask = test.groupby('Gr')['Bk'].transform(lambda x: x.eq(9).any())
test[mask]

Output

   Gr  Bk
0   1   9
1   1   1
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • This works perfectly, but why is the lambda part necessary? – Arturo Sbr Aug 19 '19 at 00:15
  • 1
    Since we use `transform` to get an equal length array back of `True` and `False` which indicates if the row should be returned in our result. Lambda is necessary since we apply multiple pandas methods (`eq` and `any`). – Erfan Aug 19 '19 at 00:31
  • Thanks man. One last question: are lambda functions executed row by row? – Arturo Sbr Aug 19 '19 at 13:19
2

There is a possiblity without groupby. Just check all entries for equality to 9, get the group of this entries and slice the whole dataframe for the extracted groups.

test[test.Gr.isin(test[test['Bk'].eq(9)].Gr)]

result

    Gr  Bk
0   1   9
1   1   1
pythonic833
  • 3,054
  • 1
  • 12
  • 27
1

You can do:

test =test[test['Gr'].apply(lambda x: x in [key for key in test['Gr'][test['Bk'].eq(9)]])]
test

Output:

    Gr  Bk
0   1   9
1   1   1
ansev
  • 30,322
  • 5
  • 17
  • 31
1

pretty straightforward approach here..

checks where test['Bk'] is equal to 9, and gets the corresponding values in test['Gr'], then reduces the df down to only the rows where these values of test['Gr'] are occurring

test[test['Gr'].isin(test[test['Bk']==9]['Gr'])]

result :

   Gr  Bk
0   1   9
1   1   1
Derek Eden
  • 4,403
  • 3
  • 18
  • 31
0

Assume there is only single Bk=9 to each group of Gr . This is a crazy and overkilled way using merge :D

test.merge(test.loc[test.Bk.eq(9),'Gr'], on='Gr')

Out[227]:
   Gr  Bk
0  1   9
1  1   1

Note: it still works on multiple 9 per group. It just needs drop_duplicates, but I think at that point it gets too complicated and not fun anymore

Andy L.
  • 24,909
  • 4
  • 17
  • 29