0

I'm using Huey/Hueyx as a queue to buffer resources I need to GET with pythons requests library. Credentials (a token) are stored in Redis and the task executor calls the function, gets the resource and then indexes it to elastic search. Because I need to request millions of different resources and my solution is quite slow (around 5k requests per minute) the queue backs up very quickly and the performance takes a hit.

def get_raw_resource(url, token, id):
        try:
                url = f"{url}/api/v2/{id}"

                headers = {
                    'Accept': "application/json",
                    'Authorization': f"Bearer {token}",
                    'Cache-Control': "no-cache"
                }
                response = requests.get(url, headers=headers) 
        [snip]


@res_q.task(delay=10, retries=3, retry_delay=180)
def get_resource_es(client_id, id):
        [snipped redis stuff]
        try:
                res = get_raw_resource(url, token, id)
                parsed = parse_res(res)
                es.index(index=es_index, body=parsed, id=id)
        except Exception as e:
                raise e

My question is: how can I make this whole ordeal faster with some easy steps? Adding more consumers is always an option, but it's not something I want to do right now before exploring other options. I've read about async code, however it looks like it would quadruple the complexity of the code. Also, I know nothing about the async programming model.

simplex123
  • 47
  • 2
  • 7
  • If you want to increase parallel activities you need to increase the amount of workers, not the amount of consumers. One consumer will work just fine. What type of workers are you using? Try experimenting with greenlets and a high number of workers, it can easily go in the 100s. – Glenn D.J. Jun 13 '20 at 19:42
  • I'm using 150 thread workers but increasing the number doesn't yield a higher throughput anymore – simplex123 Jun 14 '20 at 09:59

0 Answers0