synchronizing python thread during runtime

Question

I have a python script which makes 800,000 http requests to make sure they are giving back 200's. In case of 404's capture in a variable the url path. The url is parameterized to take 800,000 different ids. I am using 100 different threads to save time and at the end I join them all back to get number of urls 404'ed etc.,

But it takes about 2 hour to finish and have to wait to get the results. I should be able to know at any point during the runtime how many ids have finished so far, how many 404'ed etc., How can I do that?

runners = []
nthreads=100

chunk_size = ceil(len(ids)/float(nthreads))
for i in range(nthreads):
    runners.append(HeadendChecker(i*chunk_size, min(len(dac_ids), chunk_size*(i+1))))

for thread in runners:
    thread.start()

list_of_bad_ids = []
for thread in runners:
    thread.join()
    if thread.get_bad_ids() != None:
        list_of_bad_ids = list_of_bad_ids + thread.get_bad_ids()

Can't you print to console inside a thread? If not, you should have a shared variable (probably list_of_bad_ids) and, of course, take special care when accessing it from the threads. Not a python guru though, someone else will point to the right path :) — BlackBear, Aug 30 '13 at 14:35
you can probably do it with a mutex, and a timer to periodically show the information. http://stackoverflow.com/questions/3310049/proper-use-of-mutexes-in-python — lucasg, Aug 30 '13 at 14:40
This question makes me think, how many threads are too many? — djhoese, Aug 30 '13 at 14:56

score 1 · Answer 1 · answered Aug 30 '13 at 14:45

Rather than each thread storing the 200s and 404s, you can use a Queue/queue object.

You can turn your existing threads into producers: they produce (status, url id) tuples that are put onto a shared queue.

You can then add an analyser thread, which consumes items from this queue, prints status messages along the way, and stores the results in a convenient way for further processing (with "further processing" I mean any processing done after all the worker threads are finished)

Thanks m01, thats seems a v.good idea. Let me give a try to this — Krish, Aug 30 '13 at 15:43

synchronizing python thread during runtime

1 Answers1