Linear 1D interpolation on multiple datasets using loops

Question

I'm interested in performing Linear interpolation using the scipy.interpolate library. The dataset looks somewhat like this: DATAFRAME for interpolation between X, Y for different RUNs

I'd like to use this interpolated function to find the missing Y from this dataset: DATAFRAME to use the interpolation function

The number of runs given here is just 3, but I'm running on a dataset that will run into 1000s of runs. Hence appreciate if you could advise how to use the iterative functions for the interpolation ?

from scipy.interpolate import interp1d
for RUNNumber in range(TotalRuns)
 InterpolatedFunction[RUNNumber]=interp1d(X, Y)

score 1 · Accepted Answer · answered Feb 08 '19 at 17:17

As I understand it, you want a separate interpolation function defined for each run. Then you want to apply these functions to a second dataframe. I defined a dataframe df with columns ['X', 'Y', 'RUN'], and a second dataframe, new_df with columns ['X', 'Y_interpolation', 'RUN'].

interpolating_functions = dict()
for run_number in range(1, max_runs):
    run_data = df[df['RUN']==run_number][['X', 'Y']]
    interpolating_functions[run_number] = interp1d(run_data['X'], run_data['Y'])

Now that we have interpolating functions for each run, we can use them to fill in the 'Y_interpolation' column in a new dataframe. This can be done using the apply function, which takes a function and applies it to each row in a dataframe. So let's define an interpolate function that will take a row of this new df and use the X value and the run number to calculate an interpolated Y value.

def interpolate(row):
    int_func = interpolating_functions[row['RUN']]
    interp_y = int_func._call_linear([row['X'])[0] #the _call_linear method
                                                   #expects and returns an array
    return interp_y[0]

Now we just use apply and our defined interpolate function.

new_df['Y_interpolation'] = new_df.apply(interpolate,axis=1)

I'm using pandas version 0.20.3, and this gives me a new_df that looks like this:

Thanks @A. Entuluva a lot for the overall general idea on how to approach this problem statement. However, I get error "ValueError: Wrong number of items passed 2, placement implies 1" while trying to use the 'apply' function. Could you kindly share you code ? I'm using this : new_df1={'X': [1,4,12,998,1,4,12,998,1,4,12,998], 'RUN':[1,1,1,1,2,2,2,2,3,3,3,3] } new_df=pd.DataFrame(new_df1) new_df['Y_interpolation']=new_df.apply(interpolate,axis=1) — SSM, Feb 11 '19 at 10:52
Your code runs fine for me. What version of pandas are you using? The syntax for `apply` may have changed. — A. Entuluva, Feb 11 '19 at 23:59
Hi @A.Entuluva, i'm using 0.16.2. I'll try the same after upgrading to 0.20.3 . Thanks. — SSM, Feb 12 '19 at 11:53

Linear 1D interpolation on multiple datasets using loops

1 Answers1