0

I'm interested in performing Linear interpolation using the scipy.interpolate library. The dataset looks somewhat like this: DATAFRAME for interpolation between X, Y for different RUNs

I'd like to use this interpolated function to find the missing Y from this dataset: DATAFRAME to use the interpolation function

The number of runs given here is just 3, but I'm running on a dataset that will run into 1000s of runs. Hence appreciate if you could advise how to use the iterative functions for the interpolation ?

from scipy.interpolate import interp1d
for RUNNumber in range(TotalRuns)
 InterpolatedFunction[RUNNumber]=interp1d(X, Y)
SSM
  • 62
  • 8

1 Answers1

1

As I understand it, you want a separate interpolation function defined for each run. Then you want to apply these functions to a second dataframe. I defined a dataframe df with columns ['X', 'Y', 'RUN'], and a second dataframe, new_df with columns ['X', 'Y_interpolation', 'RUN'].

interpolating_functions = dict()
for run_number in range(1, max_runs):
    run_data = df[df['RUN']==run_number][['X', 'Y']]
    interpolating_functions[run_number] = interp1d(run_data['X'], run_data['Y'])

Now that we have interpolating functions for each run, we can use them to fill in the 'Y_interpolation' column in a new dataframe. This can be done using the apply function, which takes a function and applies it to each row in a dataframe. So let's define an interpolate function that will take a row of this new df and use the X value and the run number to calculate an interpolated Y value.

def interpolate(row):
    int_func = interpolating_functions[row['RUN']]
    interp_y = int_func._call_linear([row['X'])[0] #the _call_linear method
                                                   #expects and returns an array
    return interp_y[0]

Now we just use apply and our defined interpolate function.

new_df['Y_interpolation'] = new_df.apply(interpolate,axis=1)

I'm using pandas version 0.20.3, and this gives me a new_df that looks like this: interpolation results

A. Entuluva
  • 719
  • 5
  • 9
  • Thanks @A. Entuluva a lot for the overall general idea on how to approach this problem statement. However, I get error "ValueError: Wrong number of items passed 2, placement implies 1" while trying to use the 'apply' function. Could you kindly share you code ? I'm using this : new_df1={'X': [1,4,12,998,1,4,12,998,1,4,12,998], 'RUN':[1,1,1,1,2,2,2,2,3,3,3,3] } new_df=pd.DataFrame(new_df1) new_df['Y_interpolation']=new_df.apply(interpolate,axis=1) – SSM Feb 11 '19 at 10:52
  • Your code runs fine for me. What version of pandas are you using? The syntax for `apply` may have changed. – A. Entuluva Feb 11 '19 at 23:59
  • Hi @A.Entuluva, i'm using 0.16.2. I'll try the same after upgrading to 0.20.3 . Thanks. – SSM Feb 12 '19 at 11:53