1

I have data like the following:

Index  ID    data1  data2 ...
0      123   0      NaN   ...
1      123   0      1     ...
2      456   NaN    0     ...
3      456   NaN    0     ...
...

I need to drop rows which have less than or equal to the information available in otherwise identical rows.

In the example above rows 0 and either 2 xor 3 should be removed.

My best attempt so far is the rather slow, and also non-functioning:

df.groupby(by='ID').fillna(method='ffill',inplace=True).fillna(method='bfill',inplace=True)
df.drop_duplicates(inplace=True)

How can I best accomplish this goal?

Isaac
  • 361
  • 5
  • 18

1 Answers1

2

You're approach seems fine, just using in-place assignment was not working here (since you're assigning to a copy of the data), use:

df = df.groupby(by='ID', as_index=False).fillna(method='ffill').fillna(method='bfill')

df.drop_duplicates()

   ID   data1  data2
0  123    0.0    1.0
2  456    NaN    0.0
yatu
  • 86,083
  • 12
  • 84
  • 139