Dropna when another row has the missing data OR drop_duplicates with NaN matching all data

Question

I have data like the following:

Index  ID    data1  data2 ...
0      123   0      NaN   ...
1      123   0      1     ...
2      456   NaN    0     ...
3      456   NaN    0     ...
...

I need to drop rows which have less than or equal to the information available in otherwise identical rows.

In the example above rows 0 and either 2 xor 3 should be removed.

My best attempt so far is the rather slow, and also non-functioning:

df.groupby(by='ID').fillna(method='ffill',inplace=True).fillna(method='bfill',inplace=True)
df.drop_duplicates(inplace=True)

How can I best accomplish this goal?

score 2 · Accepted Answer · answered Jan 16 '20 at 10:57

You're approach seems fine, just using in-place assignment was not working here (since you're assigning to a copy of the data), use:

df = df.groupby(by='ID', as_index=False).fillna(method='ffill').fillna(method='bfill')

df.drop_duplicates()

   ID   data1  data2
0  123    0.0    1.0
2  456    NaN    0.0

Dropna when another row has the missing data OR drop_duplicates with NaN matching all data

1 Answers1