I have two dataframes, let's say dfr1 and dfr2. dfr1 looks like:
x,y,z
1,2,3
1,1,3
1,2,3
1,4,3
1,5,3
while dfr2 looks like:
p1,p2,p3
100,200,300
100,400,300
100,500,300
I would like to apply a procedure to each row of dfr1 by using dfr2. Let's say that each row of the dfr1 are the coordinate of some points. On the other and the columns of the dfr2 are the coordinates of other points that we can called "stations". For each point of dfr1 I would like to compute the distance with each points of dfr2 in order to sort the distances.
Let's define the function test:
import math
def test(rr):
res = math.sqrt((rr[0]-p1_x)**2 + (rr[1]-p1_y)**2 + (rr[2]-p_1z)**2)
return res
I have learnt to apply test to each row of dfr1:
res = dfr.apply(test, axis=1)
How can I pass some elements of dfr2 to the function. I would like to not read it inside the function due to the fact that dfr2 will have a more complicate structure. Here an example of my real original file of with the data of the stations:
,station_1,station_3,station_3
coordiante x ,100,200,300
coordiante y ,100,400,300
coordiante z ,100,500,300
2018-01-01 00:00:00 ,1,2,3
2018-01-01 01:00:00 ,2,2,3
2018-01-01 02:00:00 ,3,2,3
2018-01-01 03:00:00 ,4,2,3
2018-01-01 04:00:00 ,4,NaN,3
2018-01-01 05:00:00 ,3,2,3
I have found this solution on another post here in stackoverflow:
def func(x, other):
other_value = other.loc[x.name]
return your_actual_method(x, other_value)
result = df1.apply(lambda x: func(x, df2))
Due to the fact that I am going to use only one row at time, I would modify it as
def func(x, other):
other_value = other.loc[x.name]
return your_actual_method(x, other_value)
for i in range (4,13)
result = df1.iloc[0:5].apply(lambda x: func(x, df2.iloc[0:3],df2.iloc[i]),axis=1)
I do like the fact that I have to insert it in loop in order to pass each step a different row to the function. I am a bit concern about the computation speed. What do you think?