1

I was using the Google Maps Distance matrix API in python to calculate distances on bicycle between two points, using latitude and longitude. I was using a loop to calculate almost 300,000 rows of data for a student project (I am studying Data Science with Python). I added a debug line to output the row# and distance every 10,000 rows, but after humming away for a while with no results, I stopped the kernel and changed it to every 1000 rows. With that, after about 5 minutes it finally got to row 1000. After over an hour, it was only on row 70,000. Unbelievable. I stopped execution and later that day got an email from Google saying I had used up my free trial. so not only did it work incredibly slowly, I can't even use it at all anymore for a student project without incurring enormous fees.

So I rewrote the code to use geometry and just calculate "as the crow flies" distance. Not really what I want, but short of any alternatives, that's my only option.

Does anyone know of another (open-source, free) way to calculate distance to get what I want, or how to use the google distance matrix API more efficiently?

thanks,

so here is some more information, as suggested I post a bit more. I am trying to calculate distances between "stations", and am given lat's and long's for about 300K pairs. I was going to set up a function and then apply that function to the dataframe (bear with me, I'm still new at python and dataframes) -- but for now I was using a loop to go through all the pairs. Here is my code:

i = 0
while i < len(trip):
    from_coords = str(result.loc[i, 'from_lat']) + " " + str(result.loc[i, 'from_long'])
    to_coords =  str(result.loc[i, 'to_lat']) + " " + str(result.loc[i, 'to_long'])
    # now to get distances!!!
    distance = gmaps.distance_matrix([from_coords], #origin lat & long, formatted for gmaps
                                 [to_coords], #destination lat & long, formatted for gmaps
                                 mode='bicycling')['rows'][0]['elements'][0]  #mode=bicycling to use streets for cycling
    result['distance'] = distance['distance']['value']

    # added this bit to see how quickly/slowly the code is running
    # ... and btw it's running very slowly. had the debug line at 10000 and changed it to 1000 
    # ... and i am running on a with i9-9900K with 48GB ram
    # ... why so slow?
    if i % 1000 == 0:
        print(distance['distance']['value'])
    i += 1
Will F
  • 21
  • 2
  • 2
    Sounds like a very useful education in the value (cost/benefit-wise) of web APIs. What research have you done into alternative methods? Is Python the issue (is some other language notably faster?), or is the API the main consumer of your application execution time? – DisappointedByUnaccountableMod Dec 11 '19 at 23:32
  • 2
    What you're asking for here is going to be way beyond what you can expect to get from a free API. I'd recommend looking at the different routing engines that have been developed for [OpenStreetMap](https://wiki.openstreetmap.org/wiki/Routing). You should be able to set some of these up to run locally, as opposed to having to rely on an external server. – Joe Habel Dec 11 '19 at 23:41
  • 1
    It’s difficult to comment on the efficiency of your use of the API with practically no information on how you’re using it. – AMC Dec 12 '19 at 01:31
  • I believe the issue is the with google distance matrix api, though it may have been in the fact that i was running a loop instead of applying a function. As I said, when I changed the loop to instead use Euclidean geometry to calculate 'as the crow flies' distance, it completed all 300K rows in the blink of an eye. I did look into other APIs but free/open source ones seem to be limited in usage (anywhere from 5K to 10K requests daily or monthly... not enough.) So for this project I may just have to be content with a less ideal solution. – Will F Dec 12 '19 at 04:43

1 Answers1

2

You could approximate the distance in KM with the haversine distance.

Here I have my distances as lat/long pairs as random_distances with shape (300000, 2) as a numpy array:

import numpy as np
from sklearn.neighbors import DistanceMetric


dist = DistanceMetric.get_metric('haversine')

random_distances = np.random.random( (300000,2) )

Than we can approximate the distances with

distances = np.zeros( random_distances.shape[0] - 2 )

for idx in range(random_distances.shape[0]-2):
    distances[idx] = dist.pairwise(np.radians(random_distances[idx:idx+2]), np.radians(random_distances[idx:idx+2]) )[0][1]

distances *= 6371000/1000  # to get output as KM

distances now contains the distances.

  • It is 'allright' in speed, but can be improved. We could get rid of the for loop for instance, also 2x2 distances are returned and only 1 is used.
  • The haversine distance is an good approximation, but not exact which I imagine the API is:

From sklearn:

As the Earth is nearly spherical, the haversine formula provides a good approximation of the distance between two points of the Earth surface, with a less than 1% error on average.

Willem Hendriks
  • 1,267
  • 2
  • 9
  • 15