I am having difficulty increasing the amount of requests I can make per second with Google Maps Geocoder. I am using a paid account (at $.50/1000 requests), so according to the Google Geocoder API I should be able to make up to 50 requests per second.
I have a list of 15k address which I am trying to get GPS coordinates for. I am storing them as a Pandas Dataframe and looping over them. To make sure this wasn't due to slow looping, I tested how fast it loops over all 15k, and it only took 1.5 seconds. But I was only able to make less than 1 request per second. I realized this might be due to my slow internet connection, so I fired up a Windows Google Cloud VM with obviously fast internet. I was able to speed up the requests to about 1.5 requests/ second, but still way slower than theoretically possible.
I thought this might be due to using a python library Geocoder, so I tried making the request directly using python requests, but this didn't speed things up either.
Does this have something to do with the fact that I'm not using a server? I would think this wouldn't matter since I'm using a Google Cloud VM. Also, I know this doesn't have to do with multithreading, since it can already iterate through the loop using 1 core with extreme speed. Thanks in advance for any thoughts.
import geocoder
import pandas as pd
import time
import requests
startTime = time.time()
#Read File Name with all transactions up to October 4th
input_filename = "C:/Users/username/Downloads/transaction-export 10-04-2017.csv"
df = pd.read_csv(input_filename, header=0, error_bad_lines=False)
#Only look at customer addresses
df = df['Customer Address']
#Drop duplicates and NAs
df = df.drop_duplicates(keep='first')
df = df.dropna()
#convert dataframe to string
addresses = df.tolist()
#Google Api Key
api_key = 'my_api_key'
#create empty array
address_gps = []
#google api address
url = 'https://maps.googleapis.com/maps/api/geocode/json'
#For each address return its geocoded latlng coordinates
for int, val in enumerate(addresses):
''' Direct way to make call without geocoder
params = {'sensor': 'false', 'address': address, 'key': api_key}
r = requests.get(url, params=params)
results = r.json()['results']
location = results[0]['geometry']['location']
print location['lat'], location['lng']
num_address = num_address+1;
'''
endTime = time.time()
g = geocoder.google(val, key=api_key, exactly_one=True)
print "Address,", (val), "Number,", int, "Total,", len(addresses), "Time,", endTime-startTime
if g.ok:
address_gps.append(g.latlng)
print g.latlng
else:
address_gps.append(0)
print("Error")
#save every 100 iterations
if int%100==0:
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')
# save as csv
df1 = pd.DataFrame({'Address GPS': address_gps})
df1.to_csv('C:/Users/username/Downloads/AllCustomerAddressAsGPS.csv')