How to pad/ffill NaN values in a Pandas dataframe using "inside" as limit_area?

Question

I have a Pandas dataframe containing several NaNs like the following one

df = pd.DataFrame(np.array([[1,np.nan,np.nan, 2, 3, np.nan], [np.nan,np.nan,0,0,np.nan,2], [np.nan,4,np.nan,np.nan,5,np.nan]]).transpose())
print(df)

       0      1      2
0    1.0    NaN    NaN
1    NaN    NaN    4.0
2    NaN    0.0    NaN
3    2.0    0.0    NaN
4    3.0    NaN    5.0
5    NaN    2.0    NaN

Now, I would like to fill up the NaN values by respecting two restrictions:

Only fill the NaNs surrounded by valid values (= don't replace leading or trailing NaN's)
Use method "pad" (=ffill) for replacing the NaNs by the preceding valid number in that column

Desired solution:

       0      1      2
0    1.0    NaN    NaN
1    1.0    NaN    4.0
2    1.0    0.0    4.0
3    2.0    0.0    4.0
4    3.0    0.0    5.0
5    NaN    2.0    NaN

I found Pandas interpolate function which sounded quite promising but unfortunately I'm only able to achieve one of the mentioned restrictions.

When I use

df_padded = df.interpolate(method='pad')

the right values are used (-> preceding number of the respective column) but also the NaNs at the end of column 0 and 2 are replaced which I do not intend.

       0      1      2
0    1.0    NaN    NaN
1    1.0    NaN    4.0
2    1.0    0.0    4.0
3    2.0    0.0    4.0
4    3.0    0.0    5.0
5    3.0    2.0    5.0

When I use

df_padded = df.interpolate(limit_area='inside')

it replaces the right NaNs but it interpolates in a linear way which is also not what I want.

         0      1        2
0    1.000    NaN      NaN
1    1.333    NaN    4.000
2    1.667    0.0    4.333
3    2.000    0.0    4.667
4    3.000    1.0    5.000
5      NaN    2.0      NaN

So I thought using both parameters in the function call would give me the right output

df_padded = df.interpolate(method_'pad', limit_area='inside')

but instead nothing is happening at all.

       0      1      2
0    1.0    NaN    NaN
1    NaN    NaN    4.0
2    NaN    0.0    NaN
3    2.0    0.0    NaN
4    3.0    NaN    5.0
5    NaN    2.0    NaN

What am I doing wrong?

score 2 · Accepted Answer · answered Mar 16 '23 at 15:08

2

Ideally the interpolate solution should have worked, I believe it might be a bug nevertheless here is alternative approach that you can use

s1, s2 = df.ffill(), df.bfill()
result = s1.mask(s1.isna() | s2.isna())

Result

     0    1    2
0  1.0  NaN  NaN
1  1.0  NaN  4.0
2  1.0  0.0  4.0
3  2.0  0.0  4.0
4  3.0  0.0  5.0
5  NaN  2.0  NaN

answered Mar 16 '23 at 15:08

Shubham Sharma

68,127
6
24
53

@Shubham_Sharma thanks for your quick reply and for showing me an alternative solution. I might create a bug report on GitHub to resolve the issue. – pa1ric6 Mar 17 '23 at 07:43

score 1 · Answer 2 · answered Mar 16 '23 at 15:11

1

Just need a little trick:

# This is for generating a mask
df_padded  = df.interpolate(limit_area='inside')

# This is the intepolation with previous value
df_fill = df.fillna(method="ffill")

mask = df_padded.isna()
df_fill[mask] = np.nan
df_fill

You get :

     0    1    2
0  1.0  NaN  NaN
1  1.0  NaN  4.0
2  1.0  0.0  4.0
3  2.0  0.0  4.0
4  3.0  0.0  5.0
5  NaN  2.0  NaN

answered Mar 16 '23 at 15:11

HMH1013

1,216
2
13

Thanks for your quick reply. The mask trick is just awesome! :) – pa1ric6 Mar 17 '23 at 07:47

How to pad/ffill NaN values in a Pandas dataframe using "inside" as limit_area?

2 Answers2