Pandas interpolation type when method='index'?

Question

The pandas documentation indicates that when method='index', the numerical values of the index are used. However, I haven't found any indication of the underlying interpolation method employed. It looks like it uses linear interpolation. Can anyone confirm this definitively or point me to where this is stated in the documentation?

Mr. For Example · Accepted Answer · 2021-01-05T02:41:32.453

So turns out the document is bit misleading for those who read it will likely to think:

‘index’, ‘values’: use the actual numerical values of the index.

as fill the NaN values with numerical values of the index which is not correct, we should read it as linear interpolate value use the actual numerical values of the index

The difference between method='linear' and method='index' in source code of pandas.DataFrame.interpolate mainly are in following code:

if method == "linear":
# prior default
    index = np.arange(len(obj.index))
    index = Index(index)
else:
    index = obj.index

So if you using the default RangeIndex as index of the dataframe, then interpolate results of method='linear' and method='index' will be the same, however if you specify the different index then results will not be the same, following example will show you the difference clearly:

import pandas as pd
import numpy as np

d = {'val': [1, np.nan, 3]}
df0 = pd.DataFrame(d)
df1 = pd.DataFrame(d, [0, 1, 6])

print("df0:\nmethod_index:\n{}\nmethod_linear:\n{}\n".format(df0.interpolate(method='index'), df0.interpolate(method='linear')))
print("df1:\nmethod_index:\n{}\nmethod_linear:\n{}\n".format(df1.interpolate(method='index'), df1.interpolate(method='linear')))

Outputs:

df0:
method_index:
   val
0  1.0
1  2.0
2  3.0
method_linear:
   val
0  1.0
1  2.0
2  3.0

df1:
method_index:
   val
1  1.000000
2  1.333333
6  3.000000
method_linear:
   val
1  1.0
2  2.0
6  3.0

As you can see, when index=[0, 1, 6] with val=[1.0, 2.0, 3.0], the interpolated value is 1.0 + (3.0-1.0) / (6-0) = 1.333333

Following the runtime of the pandas source code (generic.py -> managers.py -> blocks.py -> missing.py), we can find the implementation of linear interpolate value use the actual numerical values of the index:

NP_METHODS = ["linear", "time", "index", "values"]

if method in NP_METHODS:
    # np.interp requires sorted X values, #21037
    indexer = np.argsort(inds[valid])
    result[invalid] = np.interp(
        inds[invalid], inds[valid][indexer], yvalues[valid][indexer]
    )

So, you have confirmed what I saw: namely that both scenarios seem to use linear (i.e., straight-line) interpolation but with different values for the x-axis. My question was where this is mentioned in the documentation. — rhz, Jan 04 '21 at 07:33
Document not explain it in detail, that's why I show you the difference in source code, you said "confirm this definitively or point me to where this is stated in the documentation", i can't point you where it not exist — Mr. For Example, Jan 04 '21 at 08:29
The code you posted shows the different choices for x-axis values used in the two interpolation methods but not the type of interpolation method applied. For "linear" it seems evident that linear interpolation is used. For "index", it's not obvious what interpolation method is used. Both of us have seen that empirically that some form of linear interpolation is used in both cases. Can you point to the place in the code that shows that linear (straight line) interpolation is used when method = 'index'? — rhz, Jan 04 '21 at 17:23

Pandas interpolation type when method='index'?

1 Answers1