0

I have two dataframes, let's say dfr1 and dfr2. dfr1 looks like:

x,y,z
1,2,3
1,1,3
1,2,3
1,4,3
1,5,3

while dfr2 looks like:

p1,p2,p3
100,200,300  
100,400,300
100,500,300                 

I would like to apply a procedure to each row of dfr1 by using dfr2. Let's say that each row of the dfr1 are the coordinate of some points. On the other and the columns of the dfr2 are the coordinates of other points that we can called "stations". For each point of dfr1 I would like to compute the distance with each points of dfr2 in order to sort the distances.

Let's define the function test:

import math 
def test(rr):
  res = math.sqrt((rr[0]-p1_x)**2 + (rr[1]-p1_y)**2 + (rr[2]-p_1z)**2)  
  return res

I have learnt to apply test to each row of dfr1:

res = dfr.apply(test, axis=1)

How can I pass some elements of dfr2 to the function. I would like to not read it inside the function due to the fact that dfr2 will have a more complicate structure. Here an example of my real original file of with the data of the stations:

            ,station_1,station_3,station_3
coordiante x          ,100,200,300  
coordiante y          ,100,400,300
coordiante z          ,100,500,300
2018-01-01 00:00:00   ,1,2,3
2018-01-01 01:00:00   ,2,2,3
2018-01-01 02:00:00   ,3,2,3
2018-01-01 03:00:00   ,4,2,3
2018-01-01 04:00:00   ,4,NaN,3
2018-01-01 05:00:00   ,3,2,3

I have found this solution on another post here in stackoverflow:

def func(x, other):
    other_value = other.loc[x.name]
    return your_actual_method(x, other_value)

result = df1.apply(lambda x: func(x, df2))

Due to the fact that I am going to use only one row at time, I would modify it as

def func(x, other):
    other_value = other.loc[x.name]
    return your_actual_method(x, other_value)

for i in range (4,13)
    result = df1.iloc[0:5].apply(lambda x: func(x, df2.iloc[0:3],df2.iloc[i]),axis=1)

I do like the fact that I have to insert it in loop in order to pass each step a different row to the function. I am a bit concern about the computation speed. What do you think?

diedro
  • 511
  • 1
  • 3
  • 15
  • What are `p1_x`, `p1_y` and `p_1z` in your function? – It_is_Chris Jul 07 '21 at 20:06
  • They are the coordinate of the "stations". For example for the first station they are 100,200,300. You get the point. – diedro Jul 07 '21 at 20:10
  • 1
    how big are your dataframes? also, if you are willing to use another library like sklearn, then there is for example [pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html#sklearn-metrics-pairwise-distances) that does the job, at least with the function `test` define as the Euclidian norm in your question – Ben.T Jul 07 '21 at 20:16
  • and what do you mean "dfr2 will have a more complicate structure"? – Ben.T Jul 07 '21 at 20:20
  • My dataframe has 2000 rows and 244 columns. I could use pairwise_distances. However, the fact that I have to compute the distance is just an example. I have added in the question a piece of the real dataframe. – diedro Jul 07 '21 at 21:39

0 Answers0