1

I have values in a group and if the value falls within the range 5 to 25, then I want to keep this group in the data.

Based on Pandas: remove group from the data when a value in the group meets a required condition, I wrote this:

dfnew = df.groupby('groupname').filter(lambda x: (x['column2']>=5) & (x['column2']<=25))

When I use this, there's this error: filter function returned a Series, but expected a scalar bool

Then I also tried:

dfnew = df.groupby('groupname').filter(lambda x: 5<= x['column2']<=25)

But it gave error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

And then I tried:

dfnew = df.groupby('groupname').filter(lambda x: (x['column2'].any()>=5) & (x['column2'].any()<=25))

Which just returns an empty dataframe with the column names

I am very new to python and datascience (literally coded for a few days). Please explain what's going on and help! Thank you so much!!

1 Answers1

1

I think you were almost there. You need to use max and min to test the values in the group. Here is an example with a toy dataset.

First the data:

import pandas as pd

data = pd.DataFrame(
    {
        'id': [1, 2, 3] * 3,
        'value': [3, 20, 21, 6, 24, 7, 21, 8, 50]
    }
)

data

Which gives us:

    id  value
0   1   3
1   2   20
2   3   21
3   1   6
4   2   24
5   3   7
6   1   21
7   2   8
8   3   50

Then use a group/filter pattern to retain only the groups where the min value in the group is greater than or equal to 5, and the max value in the group is less than or equal to 25. In this case, we expect only group 2 to be returned.

data.groupby('id').filter(lambda x: (x['value'].max() <= 25) & (x['value'].min() >= 5))

And that is what we get:

id  value
1   2   20
4   2   24
7   2   8
smj
  • 1,264
  • 1
  • 7
  • 14