0

I'm having difficulty in preventing pd.DataFrame.interpolate(method='index') from extrapolation.

Specifically:

>>> df = pd.DataFrame({1: range(1, 5), 2: range(2, 6), 3 : range(3, 7)}, index = [1, 2, 3, 4])
>>> df = df.reindex(range(6)).reindex(range(5), axis=1)
>>> df.iloc[3, 2] = np.nan
>>> df
    0    1    2    3   4
0 NaN  NaN  NaN  NaN NaN
1 NaN  1.0  2.0  3.0 NaN
2 NaN  2.0  3.0  4.0 NaN
3 NaN  3.0  NaN  5.0 NaN
4 NaN  4.0  5.0  6.0 NaN
5 NaN  NaN  NaN  NaN NaN

So df is just a block of data surrounded by NaN, with an interior missing point at iloc[3, 2]. Now when I apply .interpolate() (along either the horizontal or vertical axis), my goal is to have ONLY that interior point filled, leaving the surrounding NaNs untouched. But somehow I'm not able to get it to work.

I tried:

>>> df.interpolate(method='index', axis=0, limit_area='inside')
    0    1    2    3   4
0 NaN  NaN  NaN  NaN NaN
1 NaN  1.0  2.0  3.0 NaN
2 NaN  2.0  3.0  4.0 NaN
3 NaN  3.0  4.0  5.0 NaN
4 NaN  4.0  5.0  6.0 NaN
5 NaN  4.0  5.0  6.0 NaN

Note the last row got filled, which is undesirable. (btw, I'd think the fill value should be linear extrapolation based on index, but it is just padding the last value, which is highly undesirable.)

I also tried combination of limit and limit_direction to no avail.

What would be the correct argument setting to get the desired result? Hopefully without some contorted masking (but that would work too). Thx.

Zhang18
  • 4,800
  • 10
  • 50
  • 67
  • Are you sure you didn't accidentally assign a prior result back to `df` where you didn't include `limit_area`, then you ran the above code with `limit_area`? Using only your 4 lines of code should correctly keep the last row as `NaN` – ALollz May 08 '20 at 19:15
  • Also what version of `pandas`, the `limit_area` argument has had quite a few bugs so that might be the issue. – ALollz May 08 '20 at 19:18
  • works as expected for me on both pandas version `0.24.2` and `1.0.3`. – Quang Hoang May 08 '20 at 19:23
  • Ok, turns out I'm running this on Pandas 0.21, hence the `limit_area` is silently failing. Looks like starting from 0.24 this is fixed. Case closed. – Zhang18 May 09 '20 at 20:44

1 Answers1

0

Ok, turns out I'm running this on Pandas 0.21, hence the limit_area argument is silently failing. Looks like starting from 0.24 this is fixed. Case closed.

Zhang18
  • 4,800
  • 10
  • 50
  • 67