0

I am looking for a way to interpolate between two values (A and G) such that the sum of the interpolated values is equal to the second value (G), preferably while the distances between the interpolated values are linear/equally-sized.

What I got is:

Label Value
A 0
B NaN
C NaN
D NaN
E NaN
F NaN
G 10

... and I want to get to this:

Label Value
A 0
B 2
C 2
D 2
E 2
F 2
G 10

The function pandas.interpolate unfortunately does not allow for this. I could manually create sections in these columns using something like numpy.linspace but this appears to be a rather makeshift solution and not particularly efficient for larger tables where the indices that require interpolation are irregularly scatter across rows.

What is the most efficient way to do this in Python?

MichlF
  • 139
  • 1
  • 8

1 Answers1

0

I don't know if this is the most efficient way but it works for any number of breaks, including none, using only numpy and pandas:

df['break'] = np.where(df['Value'].notnull(), 1, 0)
df['group'] = df['break'].shift().fillna(0).cumsum()
df['Value'] = df.groupby('group').Value.apply(lambda x: x.fillna( x.max() / (len(x)-1) ) )

You will get a couple of warnings from the underlying numpy calculations due to NaNs and zeroes but the replacement still works.

RuntimeWarning: invalid value encountered in double_scalars

RuntimeWarning: divide by zero encountered in double_scalars

skabo
  • 31
  • 1
  • 5