trying to loop through columns in dataframe performing linear interpolation

Question

I have a code that finds a value through linear interpolation. Below is the code. This code works fine when I specify the column I would like to perform the linear interpolation on. However, when I try to perform the linear interpolation on every column using the second example I get:

AttributeError: 'list' object has no attribute 'values'

How do I loop through the entire dataframe (data_1, data_2, and avg_val)? Is there a linear interpolation function in python or pandas? I feel this problem is due to my lack of understanding about the different objects in python.

import pandas as pd
import numpy as np

#data set example
df = pd.DataFrame({'data_1' : [12.9, 11.8, 11.1, 10.4, 10.6, 8.9 , 7.7 , 7.9 ],
                   'data_2' : [ 8.3, 10.2, 14.1, 16.4, 10.1, 8.3 , 9.9 , 8.8 ],
                'date_time': ['2018-09-01 00:00:00', '2018-09-01 00:10:00', '2018-09-01 00:20:00', '2018-09-01 00:30:00', '2018-09-01 00:40:00', '2018-09-01 00:50:00', '2018-09-01 01:00:00', '2018-09-01 01:10:00']})

df = df.set_index('date_time')

df['avg_val'] = df.mean(axis=1)

#lookup table to perform linear interpolation
df_lookup = pd.DataFrame({'lookuplow'  :  [7   , 7.5 , 8   , 8.5 , 9   , 9.5 , 10  , 10.5, 11  , 11.5, 12  , 12.5, 13  , 13.5, 14  , 14.5],
                         'lookuphigh'  :  [7.5 , 8   , 8.5 , 9   , 9.5 , 10  , 10.5, 11  , 11.5, 12  , 12.5, 13  , 13.5, 14  , 14.5, 15 ],
                         'valuelow'    :  [470 , 583 , 713 , 857 , 1015, 1177, 1334, 1469, 1560, 1600, 1600, 1600, 1600, 1600, 1600, 1600],
                         'valuehigh'   :  [583 , 713 , 857 , 1015, 1177, 1334, 1469, 1560, 1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600]})

#setting up variables for loop
pos_val = []
refnum = []
cutout = 15
for i in range(len(df)):
    ws = df.avg_val[i]

    if ws > cutout:
        pp = 0
    else:
        mask = (df_lookup['lookuplow'] <= ws) & (df_lookup['lookuphigh'] >= ws)
        ok_now = df_lookup.loc[mask]
        pp = (ok_now['valuelow'] + (ws - ok_now['lookuplow'])*(ok_now['valuehigh'] - ok_now['valuelow']) / (ok_now['lookuphigh'] - ok_now['lookuplow']))

    pos_val.append(pp.values)
    refnum.append(ws)

james = pd.DataFrame(np.concatenate(pos_val))
df = df.reset_index()
df['avg_pos_val'] = james

Second example of trying to loop through every column:

for j in range(len(df.columns)):
#setting up variables for loop
    pos_val = []
    refnum = []

    for i in range(len(df)):
        ws = df.iloc[i,j]

        if ws > cutout:
            pp = 0
        else:
            mask = (df_lookup['lookuplow'] <= ws) & (df_lookup['lookuphigh'] >= ws)
            ok_now = df_lookup.loc[mask]
            pp = (ok_now['valuelow'] + (ws - ok_now['lookuplow'])*(ok_now['valuehigh'] - ok_now['valuelow']) / (ok_now['lookuphigh'] - ok_now['lookuplow']))

        pos_val.append(pp)
        refnum.append(ws)

    james = pd.DataFrame(np.concatenate(pos_val.values))
    df = df.reset_index()
    df['%s_pos' % (df.columns[j])] = james
    df = df.set_index('date_time')

Parfait, thanks for the interest. My apologies, I don't think I use refnum in either example. I think refnum is a hold over from when I thought I could return ws next to pp. Should I delete it from my examples? — getaglow, Dec 02 '18 at 22:44
Simply remove the `.values` on *pos_val* in second code block which you never use in first code block: `pd.DataFrame(np.concatenate(pos_val))`. — Parfait, Dec 02 '18 at 23:18
Thanks Parfait. I tried removing the .values and now I recieve, ValueError: all the input arrays must have same number of dimensions — getaglow, Dec 03 '18 at 14:40
Its faulting on, ---> 21 james = pd.DataFrame(np.concatenate(pos_val)) — getaglow, Dec 03 '18 at 14:41
I got the code to work by taking Parfait's advice and removed the .values. Then I replaced the np.concatenate with np.hstack, I don't understand why it worked but it is related to numpy arrays. See the following answer; https://stackoverflow.com/questions/38848759/valueerror-all-the-input-arrays-must-have-same-number-of-dimensions — getaglow, Dec 03 '18 at 15:01

score 0 · Answer 1 · answered Dec 03 '18 at 16:02

This is what the solution I found looks like. If there is a linear interpolation function in python, please forward along. Thank you Parfait for your help in getting to a solution.

    for j in range(len(df.columns)):
    print (j)
#setting up variables for loop

    pos_val = []
    refnum = []

    for i in range(len(df)):
        ws = df.iloc[i,j]

        if ws > cutout:
            pp = 0
        else:
            mask = (df_lookup['lookuplow'] <= ws) & (df_lookup['lookuphigh'] >= ws)
            ok_now = df_lookup.loc[mask]
            pp = (ok_now['valuelow'] + (ws - ok_now['lookuplow'])*(ok_now['valuehigh'] - ok_now['valuelow']) / (ok_now['lookuphigh'] - ok_now['lookuplow']))

        pos_val.append(pp)
        refnum.append(ws)

    james = pd.DataFrame(np.hstack(pos_val))
    #james = pd.DataFrame(np.concatenate(pos_val))
    df = df.reset_index()  #need to reset index in order to add james to dataframe
    df['%s_pos' % (df.columns[j+1])] = james  #adding 1 to accomadate the date_time column not being index
    df = df.set_index('date_time')
    print (james)
    print (df)
    james = james.drop(j)

trying to loop through columns in dataframe performing linear interpolation

1 Answers1