lookup, interpolate and match between pandas dataframes

Question

new here, and this is my first ever question, I hope I can reciprocate by answering other peoples questions in the future.

I have two pandas dataframes, A & B, each containing float64 values, both with the same dimensions and same index and column float64 numerical entries. E.g.

A
                        0.5       1.0       1.5       2.0
    -1.947417e-15  0.015178  0.045640  0.074928  0.100943
    -5.000000e-01  0.047697  0.149200  0.256133  0.359234
    -1.000000e+00  0.082972  0.272634  0.494826  0.733841
    -1.500000e+00  0.120121  0.417265  0.811659  1.303373
    -2.000000e+00  0.155978  0.572880  1.206698  2.159998


B
                        0.5       1.0       1.5       2.0
    -1.947417e-15  0.003859  0.018968  0.042486  0.070368
    -5.000000e-01  0.003859  0.036093  0.100487  0.187447
    -1.000000e+00 -0.007340  0.018968  0.107873  0.254083
    -1.500000e+00 -0.029713 -0.038906  0.042486  0.178620
    -2.000000e+00 -0.059926 -0.134084 -0.091504  0.032577

I would like to be able to lookup a fixed value (e.g. 0.25) in each data column in A, returning the (linearly) interpolated index value for each column (but only when the value exists in the range of the column data values - so not column 0.5 in A). Dataframe A will always have the column data monotonically increasing.

Once I have the interpolated index values for each column in Dataframe A for a data value, say 0.25, then I would like to retrieve the values within Dataframe B that have the same 'coordinates', i.e. the interpolated index and equivalent column value.

So for example, a lookup of 0.25 in A, then for column 1.5 would interpolate the index between 0 and -0.5, somewhere close to but greater than the value -0.5. The value returned from B would be a lookup of the index in B for column 1.5, so returning somewhere between 0.042486 and 0.100487, close to but less than 0.100487 value. I would want to return these '0.25' coordinate values in A and lookup & interpolate for the equivalent values in each column in B.

Slightly confusing, does it make sense?

I've done a lot of searching and can find some pandas dataframe interpolation methods, but generally they are in the context of filling in of NaNs.

I'm very appreciative of the help, so thank you in advance for reading and thinking about this. Maybe it is a simple solution, but I can't think of it or find it online anywhere.

Second part is solved here by Nick Holden, very nice, tku. [https://stackoverflow.com/questions/10464738/interpolation-on-dataframe-in-pandas] First part is getting a set of interpolated index and column values for the line z = some_value through the dataframe A (e.g. z = 0.25) — python_newbie, Nov 24 '17 at 20:33
I didn't link correctly to Nick Holden partial solution to this, [https://stackoverflow.com/a/30733959/8195971] — python_newbie, Nov 24 '17 at 20:52
I have solved this now, I'll post my solution after I've tidied it up a bit. — python_newbie, Nov 26 '17 at 12:00

python_newbie · Answer 1 · 2017-11-27T18:43:34.747

I'm sure this can be improved, none-the-less, I'll post it anyway...

import pandas as pd
from   numpy import nan
import numpy as np

A = np.array([0.015178, 0.045640, 0.074928, 0.100943,\
              0.047697, 0.149200, 0.256133, 0.359234,\
              0.082972, 0.272634, 0.494826, 0.733841,\
              0.120121, 0.417265, 0.811659, 1.303373,\
              0.155978, 0.572880, 1.206698, 2.159998]).reshape(5,4)

B = np.array([0.003859, 0.018968, 0.042486, 0.070368,\
              0.003859, 0.036093, 0.100487, 0.187447,\
             -0.007340, 0.018968, 0.107873, 0.254083,\
             -0.029713,-0.038906, 0.042486, 0.178620,\
             -0.059926,-0.134084,-0.091504, 0.032577]).reshape(5,4)

A = pd.DataFrame(A,index=np.arange(0,-2.1,-0.5),columns=np.arange(0.5,2.1,0.5))

B = pd.DataFrame(B,index=np.arange(0,-2.1,-0.5),columns=np.arange(0.5,2.1,0.5))


def get_level_intersection(grid, level, axis=0): 

    if (axis==0): 
        gd = grid.copy(deep=True)
    else:
        gd = grid.T.copy(deep=True)

    intersection = [] 
    columns=['a_value','b_value']
    if (axis==1): 
        columns=columns[::-1]

    for i in np.arange(len(gd.axes[1])):
        b_ind = gd.axes[1][i] 
        a_ind1 = gd[gd[:]>level].get(gd.axes[1][i]).first_valid_index()

        if (a_ind1 is not None): 
            a_loc = gd.axes[0].get_loc(a_ind1) 
            val_a1 = gd.get_value(a_ind1,gd.axes[1][i]) 

            if (a_loc != 0): 
                val_a = gd.get_value(gd.axes[0][a_loc-1],gd.axes[1][i])
                val_a_int = gd.axes[0][a_loc-1] + (gd.axes[0][1]-gd.axes[0][0]) * (level-val_a)/(val_a1-val_a)
            else:
                val_a = gd.get_value(gd.axes[0][0],gd.axes[1][i])
                val_a_int = (gd.axes[0][1]-gd.axes[0][0]) * (level-val_a)/(val_a1-val_a)

            intersection.append([val_a_int,gd.axes[1][i]])

    return pd.DataFrame(intersection, columns=columns)


def grid_interpolate(grid, x, y):
    gd = grid.copy(deep=True)
    if x not in gd.index:
        gd.ix[x] = nan
        gd = gd.sort()
        gd = gd.interpolate(method='index', axis=0).ffill(axis=0).bfill(axis=0)

    if y not in gd.columns.values:
        gd = gd.reindex(columns=np.append(gd.columns.values, y))
        gd = gd.sort_index(axis=1)
        gd = gd.interpolate(method='index', axis=1).ffill(axis=1).bfill(axis=1)

    return gd[y][x]


def curve_slice(grid1, grid2, level):

    gd1 = grid1.copy(deep=True)
    gd2 = grid2.copy(deep=True)

    intersection = []

    inter_cols = get_level_intersection(gd1, level)
    inter_rows = get_level_intersection(gd1, level, axis=1)
    intersection = pd.concat([inter_cols,inter_rows]).sort(inter_cols.columns[0]).reset_index(drop=True)
    intersection = intersection.append(pd.DataFrame(columns=['val']))

    for index, row in intersection.iterrows():
        intersection['val'][index]=grid_interpolate(gd2,row[0],row[1])

    return pd.DataFrame(intersection)


print(curve_slice(A,B,0.25))
>>
    a_value   b_value         val
0 -2.000000  0.612763 -0.07665051
1 -1.500000  0.718546 -0.03373118
2 -1.000000  0.940331  0.01582844
3 -0.908315  1.000000   0.0221082
4 -0.500000  1.471323  0.09679377
5 -0.483077  1.500000  0.09852392
6 -0.288545  2.000000   0.1379331

As a check,

print(curve_slice(A,A,0.25))
>>
    a_value   b_value   val
0 -2.000000  0.612763  0.25
1 -1.500000  0.718546  0.25 
2 -1.000000  0.940331  0.25
3 -0.908315  1.000000  0.25
4 -0.500000  1.471323  0.25
5 -0.483077  1.500000  0.25
6 -0.288545  2.000000  0.25

and

print(get_level_intersection(A,0.25))
>>
    a_value  b_value
0 -0.908315      1.0
1 -0.483077      1.5
2 -0.288545      2.0

print(get_level_intersection(A,0.25,axis=1))
>>
    b_value  a_value
0  1.471323     -0.5
1  0.940331     -1.0
2  0.718546     -1.5
3  0.612763     -2.0

lookup, interpolate and match between pandas dataframes

1 Answers1