Python Tornado rate limiting AsyncHttpClient fetch

Question

Currently using an API that rate limits me to 3000 requests per 10 seconds. I have 10,000 urls that are fetched using Tornado due to it's asynchronous IO nature.

How do I go about implementing a rate limit to reflect the API limit?

from tornado import ioloop, httpclient

i = 0

def handle_request(response):
    print(response.code)
    global i
    i -= 1
    if i == 0:
        ioloop.IOLoop.instance().stop()

http_client = httpclient.AsyncHTTPClient()
for url in open('urls.txt'):
    i += 1
    http_client.fetch(url.strip(), handle_request, method='HEAD')
ioloop.IOLoop.instance().start()

xyres · Answer 1 · 2017-04-29T12:39:09.613

You can check where does the value of i lies in the interval of 3000 requests. For example, if i is in between 3000 and 6000, you can set the timeout of 10 seconds on every request until 6000. After 6000, just double the timeout. And so on.

http_client = AsyncHTTPClient()

timeout = 10
interval = 3000

for url in open('urls.txt'):
    i += 1
    if i <= interval:
        # i is less than 3000
        # just fetch the request without any timeout
        http_client.fetch(url.strip(), handle_request, method='GET')
        continue # skip the rest of the loop

    if i % interval == 1:
        # i is now 3001, or 6001, or so on ...
        timeout += timeout # double the timeout for next 3000 calls

    loop = ioloop.IOLoop.current()
    loop.call_later(timeout, callback=functools.partial(http_client.fetch, url.strip(), handle_request, method='GET'))

Note: I only tested this code with small number of requests. It might be possible that the value of i would change because you're subtracting i in handle_request function. If that's the case, you should maintain another variable similar to i and perform subtraction on that.

Python Tornado rate limiting AsyncHttpClient fetch

1 Answers1