Filter out groups with NaN values for one column in Pandas

Question

Given a dataframe as follows:

  city district  year  price
0   bj       cy  2018    NaN
1   bj       cy  2019    6.0
2   sh       hp  2018    4.0
3   sh       hp  2019    3.0
4   bj       hd  2018    7.0
5   bj       hd  2019    NaN

How could I groupby city and district, and filter rows if price is NaN? Thank you.

The output I needed is like this:

  city district  year  price
0   bj       cy  2018    NaN
1   bj       cy  2019    6.0
2   bj       hd  2018    7.0
3   bj       hd  2019    NaN

I have tried with df.groupby(['city', 'district']).filter(lambda df: df[df['price'].isnull()]), but it doesn't work.

your filter could work with this : ```df.groupby(['city','district']).filter(lambda x: x.price.isna().any())``` — sammywemmy, Apr 23 '20 at 09:15

jezrael · Accepted Answer · 2020-04-23T09:17:47.503

2

Use GroupBy.transform with test if at least one NaN, it means one True per group:

mask = (df.assign(test = df['price'].isnull())
          .groupby(['city', 'district'])['test']
          .transform('any'))
df = df[mask]

print (df)
  city district  year  price
0   bj       cy  2018    NaN
1   bj       cy  2019    6.0
4   bj       hd  2018    7.0
5   bj       hd  2019    NaN

If performance is not important or small DataFrame is possible use DataFrameGroupBy.filter:

df = df.groupby(['city','district']).filter(lambda x: x['price'].isnull().any())

edited Apr 23 '20 at 09:17

answered Apr 23 '20 at 09:12

jezrael

822,522
95
1,334
1,252

and how to filter groupby rows without `NaN`s? – ah bon Apr 23 '20 at 09:20
2

@ahbon - Simpliest is invert mask `df1= df[~mask]` – jezrael Apr 23 '20 at 09:21
Sorry, one more question, if I need to exclude groups whose `year` is `2019` and `price` is `NaN`? – ah bon Apr 23 '20 at 10:12
1

@ahbon - Do you think changed `df.assign(test = df['price'].isnull())` to `df.assign(test = df['price'].isnull() & df['year'].eq(2019))` ? – jezrael Apr 23 '20 at 10:13
Is it possible to do it with your second solution? – ah bon Apr 23 '20 at 10:14
1

@ahbon - yop, `lambda x: x['price'].isnull().any() and df['year'].eq(2019).any()` ? There is `and` because scalars – jezrael Apr 23 '20 at 10:15
It raises an error: `TypeError: bad operand type for unary ~: 'float'`. – ah bon Apr 23 '20 at 10:27
@ahbon - One idea, can you test `df = df.groupby(['city','district']).filter(lambda x: (x['price'].isnull() & x['year']).any())` ? – jezrael Apr 23 '20 at 10:35
No missing values for that column, I have checked. In fact, in real data, it's a date column but I converted them to string, so I `df['price'].isnull() & df['date'].str.contains('2019-04-01')`, and `df['price'].isnull() & df['date'].isin(['2019-04-01'])`, both doesn't works. – ah bon Apr 23 '20 at 10:38
I put another question here: https://stackoverflow.com/questions/61385609/filter-grouped-rows-based-on-multiple-conditions-in-pandas – ah bon Apr 23 '20 at 11:01

Filter out groups with NaN values for one column in Pandas

1 Answers1