I am just getting started in learning how to using the requests
module in Python to fetch data from an API. I will be calling to this API using a very simple GET request, but I will need to do it 500,000+ times, only passing a different value for each request. The response is a JSON object, which I can easily parse for what I need.
The issue is that the current way that I'm doing this, which is in a for-loop
using requests
, is way too slow. As I understand it, this approach is sending a request, waiting for the response to complete, and then moving to the next request within the iterable.
To solve this, I came across the grequest
module which support asynchronous requests. With this approach, I'm hoping to be able to start many queries all at the same time, maybe in batches of 100 or so. Ideally, this would allow me to move though my large iterable much more quickly.
Through reading the documentation and a couple example, I created a hypothetical example below. Obviously this is a much smaller dataset, so I have not included the part that I will use to break all of the URLs into smaller chunks to submit at once. I'm hoping to use this sample dataset to prove my method before moving onto my real dataset.
Currently, with the code below, and using timeit
, resulting runtimes for each method are below:
- for-loop and requests = 16.9 s
- grequests async with mapping = 13.9 s
My question is, if the grequests method is starting all of the requests at the same time, then why isn't that one substantially faster? Furthermore, does anyone have any suggestions on how to better submit multiple requests at the same time?
# coding: utf-8
# In[1]:
import grequests
import requests
# In[2]:
# set up session
s = requests.session()
# In[3]:
# get a list of airports
airports = ['ATL', 'ORD', 'LAX', 'DFW', 'DEN', 'JFK', 'IAH', 'SFO', 'LAS', 'PHX',
'CLT', 'MIA', 'MCO', 'EWR', 'DTW', 'MSP', 'SEA', 'PHL', 'BOS', 'LGA',
'IAD', 'BWI', 'FLL', 'SLC', 'HNL', 'DCA', 'MDW', 'SAN', 'TPA', 'PDX',
'STL', 'MCI', 'MEM', 'MKE', 'OAK', 'CLE', 'RDU', 'BNA', 'SMF', 'HOU',
'SNA', 'AUS', 'MSY', 'SJC', 'PIT', 'SAT', 'CVG', 'DAL', 'IND']
# In[4]:
# build query string
def build_request(airport):
base_url = 'https://services.faa.gov/airport/status/'
request_string = base_url + airport + '/?format=application/json'
return request_string
# In[5]:
# create the request strings for all airports
urls = [build_request(a) for a in airports]
# In[7]:
def try_grequests(urls):
# create a set of unsent requests
rs = (grequests.get(u) for u in urls)
# send them all at the same time
data = grequests.map(rs)
return data
# In[10]:
def try_requests(urls):
# send requests one by one
data = [s.get(u).json() for u in urls]
return data
# In[11]:
# time how long it takes using grequests
get_ipython().magic(u'timeit try_grequests(urls)')
# In[12]:
# time how long it takes using requests
get_ipython().magic(u'timeit try_requests(urls)')