Restarting/Rebuilding a timed out process using Pebble in Python?

Question

I am using concurrent futures to download reports from a remote server using an API. To inform me that the report has downloaded correctly, I just have the function print out its ID.

I have an issue where there are rare times that a report download will hang in-definitely. I do not get a Timeout Error or a Connection Reset error, just hanging there for hours until I kill the whole process. This is a known issue with the API with no known workaround.

I did some research and switched to using a Pebble based approach to implement a timeout on the function. My aim is then to record the ID of the report that failed to download and start again.

Unfortunately, I ran into a bit of a brick wall as I do not know how to actually retrieve the ID of the report I failed to download. I am using a similar layout to this answer:

from pebble import ProcessPool
from concurrent.futures import TimeoutError

def sometimes_stalling_download_function(report_id):
    ...
    return report_id

with ProcessPool() as pool:
    future = pool.map(sometimes_stalling_download_function, report_id_list, timeout=10)

    iterator = future.result()

    while True:
        try:
            result = next(iterator)
        except StopIteration:
            break
        except TimeoutError as error:
            print("function took longer than %d seconds" % error.args[1])
            #Retrieve report ID here
            failed_accounts.append(result)

What I want to do is retrieve the report ID in the event of a timeout but it does not seem to be reachable from that exception. Is it possible to have the function output the ID anyway in the case of a timeout exception or will I have to re-think how I am downloading the reports entirely?

score 1 · Answer 1 · answered Mar 23 '19 at 14:24

The map function returns a future object which yields the results in the same order they were submitted.

Therefore, to understand which report_id is causing the timeout you can simply check its position in the report_id_list.

index = 0

while True:
    try:
        result = next(iterator)
    except StopIteration:
        break
    except TimeoutError as error:
        print("function took longer than %d seconds" % error.args[1])
        #Retrieve report ID here
        failed_accounts.append(report_id_list[index])
    finally:
        index += 1

Restarting/Rebuilding a timed out process using Pebble in Python?

1 Answers1