In pandas: Interpolate between two rows such that the sum of interpolated values is equal to the second row

Question

I am looking for a way to interpolate between two values (A and G) such that the sum of the interpolated values is equal to the second value (G), preferably while the distances between the interpolated values are linear/equally-sized.

What I got is:

Label	Value
A	0
B	NaN
C	NaN
D	NaN
E	NaN
F	NaN
G	10

... and I want to get to this:

Label	Value
A	0
B	2
C	2
D	2
E	2
F	2
G	10

The function pandas.interpolate unfortunately does not allow for this. I could manually create sections in these columns using something like numpy.linspace but this appears to be a rather makeshift solution and not particularly efficient for larger tables where the indices that require interpolation are irregularly scatter across rows.

What is the most efficient way to do this in Python?

Multiple gaps. Essentially I am looking for a method that behaves like `pandas.interpolate` but with the above described functioning. — MichlF, Dec 10 '21 at 14:48

score 0 · Answer 1 · answered Dec 10 '21 at 14:58

I don't know if this is the most efficient way but it works for any number of breaks, including none, using only numpy and pandas:

df['break'] = np.where(df['Value'].notnull(), 1, 0)
df['group'] = df['break'].shift().fillna(0).cumsum()
df['Value'] = df.groupby('group').Value.apply(lambda x: x.fillna( x.max() / (len(x)-1) ) )

You will get a couple of warnings from the underlying numpy calculations due to NaNs and zeroes but the replacement still works.

RuntimeWarning: invalid value encountered in double_scalars

RuntimeWarning: divide by zero encountered in double_scalars

In pandas: Interpolate between two rows such that the sum of interpolated values is equal to the second row

1 Answers1

Linked