2

I am having difficulty increasing the amount of requests I can make per second with Google Maps Geocoder. I am using a paid account (at $.50/1000 requests), so according to the Google Geocoder API I should be able to make up to 50 requests per second.

I have a list of 15k address which I am trying to get GPS coordinates for. I am storing them as a Pandas Dataframe and looping over them. To make sure this wasn't due to slow looping, I tested how fast it loops over all 15k, and it only took 1.5 seconds. But I was only able to make less than 1 request per second. I realized this might be due to my slow internet connection, so I fired up a Windows Google Cloud VM with obviously fast internet. I was able to speed up the requests to about 1.5 requests/ second, but still way slower than theoretically possible.

I thought this might be due to using a python library Geocoder, so I tried making the request directly using python requests, but this didn't speed things up either.

Does this have something to do with the fact that I'm not using a server? I would think this wouldn't matter since I'm using a Google Cloud VM. Also, I know this doesn't have to do with multithreading, since it can already iterate through the loop using 1 core with extreme speed. Thanks in advance for any thoughts.

import geocoder
import pandas as pd
import time
import requests


startTime = time.time()
#Read File Name with all transactions up to October 4th
input_filename = "C:/Users/username/Downloads/transaction-export 10-04-2017.csv"
df = pd.read_csv(input_filename, header=0, error_bad_lines=False)
#Only look at customer addresses
df = df['Customer Address']
#Drop duplicates and NAs
df = df.drop_duplicates(keep='first')
df = df.dropna()
#convert dataframe to string
addresses = df.tolist()
#Google Api Key
api_key = 'my_api_key'
#create empty array
address_gps = []
#google api address
url = 'https://maps.googleapis.com/maps/api/geocode/json'
#For each address return its geocoded latlng coordinates
for int, val in enumerate(addresses):
    ''' Direct way to make call without geocoder
    params = {'sensor': 'false', 'address': address, 'key': api_key}
    r = requests.get(url, params=params)
    results = r.json()['results']
    location = results[0]['geometry']['location']
    print location['lat'], location['lng']
    num_address = num_address+1;
    '''
    endTime = time.time()
    g = geocoder.google(val, key=api_key,  exactly_one=True)
    print "Address,", (val), "Number,", int, "Total,", len(addresses), "Time,", endTime-startTime

    if g.ok:
        address_gps.append(g.latlng)
        print g.latlng
    else:
        address_gps.append(0)
        print("Error")
    #save every 100 iterations
    if int%100==0:
        # save as csv
        df1 = pd.DataFrame({'Address GPS': address_gps})
        df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')


# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
  • Im pretty sure you can pass more than one address at a time – DJK Oct 05 '17 at 00:42
  • Hmm. That is an interesting approach, but I am having difficulty figuring out how to do that. I tried with a string array, but I am getting an error for each call. –  Oct 05 '17 at 01:41
  • According to my research you cannot actually pass more than one address at a time. –  Oct 05 '17 at 01:49
  • +1 same issue. Seems like `geocoder` is adding a 1 sec pause after the request. Not sure how to remove that. You can write your own using `requests` though. – AlexM Jun 27 '18 at 20:48

1 Answers1

2

One way to increase the speed of this is to maintain the requests session with Google, rather than creating a new session with every request. This is suggested in the geocoder documentation.

Your modified code will then be:

import requests

#Google Api Key
api_key = 'my_api_key'
#create empty array
address_gps = []
#google api address
url = 'https://maps.googleapis.com/maps/api/geocode/json'
#For each address return its geocoded latlng coordinates
with requests.Session() as session:
    for int, val in enumerate(addresses):
        ''' Direct way to make call without geocoder
        params = {'sensor': 'false', 'address': address, 'key': api_key}
        r = requests.get(url, params=params)
        results = r.json()['results']
        location = results[0]['geometry']['location']
        print location['lat'], location['lng']
        num_address = num_address+1;
        '''
        endTime = time.time()
        g = geocoder.google(val, key=api_key,  exactly_one=True, session=session)
        print "Address,", (val), "Number,", int, "Total,", len(addresses), "Time,", endTime-startTime

        if g.ok:
            address_gps.append(g.latlng)
            print g.latlng
        else:
            address_gps.append(0)
            print("Error")
        #save every 100 iterations
        if int%100==0:
            # save as csv
            df1 = pd.DataFrame({'Address GPS': address_gps})
            df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')

# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
AlexM
  • 1,020
  • 2
  • 17
  • 35