I have a python script which makes 800,000 http requests to make sure they are giving back 200's. In case of 404's capture in a variable the url path. The url is parameterized to take 800,000 different ids. I am using 100 different threads to save time and at the end I join them all back to get number of urls 404'ed etc.,
But it takes about 2 hour to finish and have to wait to get the results. I should be able to know at any point during the runtime how many ids have finished so far, how many 404'ed etc., How can I do that?
runners = []
nthreads=100
chunk_size = ceil(len(ids)/float(nthreads))
for i in range(nthreads):
runners.append(HeadendChecker(i*chunk_size, min(len(dac_ids), chunk_size*(i+1))))
for thread in runners:
thread.start()
list_of_bad_ids = []
for thread in runners:
thread.join()
if thread.get_bad_ids() != None:
list_of_bad_ids = list_of_bad_ids + thread.get_bad_ids()