-1

How can I speed up the execution of this line:

from geopy import distance

...

df['Km'] = df.apply((lambda row: distance.distance(row['coord_1'],row['coord_2']).km),axis=1)

where coord_1 and coord_2 are two large sets of coordinates.

distance.distance is a geopy function (https://github.com/geopy/geopy/blob/master/geopy/distance.py)

Thanking you up in advance.

--- Update: I found a Cython implementation of the Vincenty formula@ github.com/dmsul/cyvincenty.git. It greatly sped up the performance ---

erchugo
  • 31
  • 2
  • 4
    You need to vectorize the `distance.distance` function (may be it already is, check the documentation). Is the function your code or came from a third party library? – Code Different Jul 01 '21 at 17:32
  • 1
    If the function is written in python, which it probably is, implementing your own in C might help. Python is horrendously slow at calculating pretty much anything. – Kilves Jul 01 '21 at 17:47
  • the distance.distance function is from the geopy library – erchugo Jul 01 '21 at 17:57

1 Answers1

1

Replaced Geopy with a Cython implementation of the Vincenty formula@ github.com/dmsul/cyvincenty.git.

It greatly sped up the performance.

Thanks @Kilves. Your comment really put me on the right track.

erchugo
  • 31
  • 2