1

First dataframe df1 contains id and their corresponding two coordinates. For each coordinate pair in the first dataframe, i have to loop through the second dataframe to find the one with the least distance. I tried taking the individual coordinates and finding the distance between them but it does not work as expected. I believe it has to be taken as a pair when finding the distance between them. Not sure whether Python offers some methods to achieve this.

For eg: df1

Id        Co1            Co2
334    30.371353      -95.384010
337    39.497448      -119.789623

df2

Id       Co1             Co2
339    40.914585      -73.892456
441    34.760395      -77.999260

dfloc3 =[[38.991512-77.441536],
         [40.89869-72.37637],
         [40.936115-72.31452],
         [30.371353-95.38401],
         [39.84819-75.37162],
         [36.929306-76.20035],
         [40.682342-73.979645]]


dfloc4 = [[40.914585,-73.892456],
          [41.741543,-71.406334],
          [50.154522,-96.88806],
          [39.743565,-121.795761],
          [30.027597,-89.91014],
          [36.51881,-82.560844],
          [30.449587,-84.23629],
          [42.920475,-85.8208]]
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
user3447653
  • 3,968
  • 12
  • 58
  • 100
  • What output do you expect? How are you calculating the distance between coordinates. Are you using pythagoras? Please show us your code. – James K Aug 19 '16 at 17:24
  • This might help: http://codereview.stackexchange.com/q/28207/110101 or this: http://stackoverflow.com/q/1901139/4996248 – John Coleman Aug 19 '16 at 17:25
  • dfloc3 = [[38.991512-77.441536], [40.89869-72.37637], [40.936115-72.31452], [30.371353-95.38401], [39.84819-75.37162], [36.929306-76.20035], [40.682342-73.979645]] dfloc4 = [[40.914585,-73.892456], [41.741543,-71.406334], [50.154522,-96.88806], [39.743565,-121.795761], [30.027597,-89.91014], [36.51881,-82.560844], [30.449587,-84.23629], [42.920475,-85.8208]] – user3447653 Aug 19 '16 at 18:12
  • Are `dfloc3` and `dfloc4` your expected output? If so, how did you calculate them? Or are you trying to find the nearest point in `df2` for each point in `df1`? Or the nearest point in `dfloc3` for each point in `df1`? – Matthias Fripp Aug 19 '16 at 19:27

2 Answers2

1

The code below creates a new column in df1 showing the Id of the nearest point in df2. (I can't tell from the question if this is what you want.) I'm assuming the coordinates are in a Euclidean space, i.e., that the distance between points is given by the Pythagorean Theorem. If not, you could easily use some other calculation instead of dist_squared.

import pandas as pd

df1 = pd.DataFrame(dict(Id=[334, 337], Co1=[30.371353, 39.497448], Co2=[-95.384010, -119.789623]))
df2 = pd.DataFrame(dict(Id=[339, 441], Co1=[40.914585, 34.760395], Co2=[-73.892456, -77.999260]))

def nearest(row, df):
    # calculate euclidian distance from given row to all rows of df
    dist_squared = (row.Co1 - df.Co1) ** 2 + (row.Co2 - df.Co2) ** 2
    # find the closest row of df
    smallest_idx = dist_squared.argmin()
    # return the Id for the closest row of df
    return df.loc[smallest_idx, 'Id']

near = df1.apply(nearest, args=(df2,), axis=1)

df1['nearest'] = near
Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
1

Given you can get your points into a list like so...

df1 = [[30.371353, -95.384010], [39.497448, -119.789623]]
df2 = [[40.914585, -73.892456], [34.760395, -77.999260]]

Import math then create a function to make finding the distance easier:

import math    

def distance(pt1, pt2):
    return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)

Then simply transverse your your list saving the closest points:

for pt1 in df1:
    closestPoints = [pt1, df2[0]]
    for pt2 in df2:
        if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
            closestPoints = [pt1, pt2]
    print ("Point: " + str(closestPoints[0]) + " is closest to " + str(closestPoints[1]))

Outputs:

Point: [30.371353, -95.38401] is closest to [34.760395, -77.99926]
Point: [39.497448, -119.789623] is closest to [34.760395, -77.99926]
Nick M.
  • 142
  • 4
  • @Nick.M: I am getting the error 'list index out of range'. Not sure why I am getting this. I have added the list in the question. – user3447653 Aug 19 '16 at 18:13
  • Your first list needs to separate the x and y with a , like you did in the second one – Nick M. Aug 19 '16 at 18:17