I'm building a distributed crawling mechanism and want to make sure that no more than 30 requests are made to the server in one minute. Each enqued task makes a request.
All tasks are enqued in redis and are dequed using api provided by python-rq.
The approach is to set a key in redis that expires every minute, to hold the number of requests sent.
Each time a piece of work is available, check if requests sent < 30 - If no, then just sleep for a minute - If yes, then work
Following is my custom worker :
#!/usr/bin/env python
import sys
import time
from rq import Connection, Worker
from redis import Redis
redis = Redis()
def should_i_work():
r = redis.get('app:requests_sent_in_last_minute')
if r == None:
redis.setex('app:requests_sent_in_last_minute', 1, 60)
return r == None or int(r) < 30
def increment_requests():
r = int(redis.get('app:requests_sent_in_last_minute'))
redis.set('app:requests_sent_in_last_minute', r+1)
def main(qs):
with Connection():
try:
while True:
if should_i_work():
increment_requests()
w = Worker(qs)
w.work()
else:
time.sleep(60)
except KeyboardInterrupt:
pass
if __name__ == '__main__':
qs = sys.argv[1:] or ['default']
main(qs)
This doesn't seem to work as the worker performs tasks despite of the number at its usual speed and also the value of the key being set is not updated beyond 3.
I have a strong feeling that my thought process is flawed. What am I doing wrong here ?
Thanks