0

I have a python script which makes 800,000 http requests to make sure they are giving back 200's. In case of 404's capture in a variable the url path. The url is parameterized to take 800,000 different ids. I am using 100 different threads to save time and at the end I join them all back to get number of urls 404'ed etc.,

But it takes about 2 hour to finish and have to wait to get the results. I should be able to know at any point during the runtime how many ids have finished so far, how many 404'ed etc., How can I do that?

runners = []
nthreads=100

chunk_size = ceil(len(ids)/float(nthreads))
for i in range(nthreads):
    runners.append(HeadendChecker(i*chunk_size, min(len(dac_ids), chunk_size*(i+1))))

for thread in runners:
    thread.start()

list_of_bad_ids = []
for thread in runners:
    thread.join()
    if thread.get_bad_ids() != None:
        list_of_bad_ids = list_of_bad_ids + thread.get_bad_ids()
Krish
  • 467
  • 1
  • 6
  • 16
  • Can't you print to console inside a thread? If not, you should have a shared variable (probably list_of_bad_ids) and, of course, take special care when accessing it from the threads. Not a python guru though, someone else will point to the right path :) – BlackBear Aug 30 '13 at 14:35
  • you can probably do it with a mutex, and a timer to periodically show the information. http://stackoverflow.com/questions/3310049/proper-use-of-mutexes-in-python – lucasg Aug 30 '13 at 14:40
  • This question makes me think, how many threads are too many? – djhoese Aug 30 '13 at 14:56
  • Thanks, let me try with mutex idea. Thanks georgesl – Krish Aug 30 '13 at 15:41
  • Anything above 100 threads gets cranky – Krish Aug 30 '13 at 15:42

1 Answers1

1

Rather than each thread storing the 200s and 404s, you can use a Queue/queue object.

You can turn your existing threads into producers: they produce (status, url id) tuples that are put onto a shared queue.

You can then add an analyser thread, which consumes items from this queue, prints status messages along the way, and stores the results in a convenient way for further processing (with "further processing" I mean any processing done after all the worker threads are finished)

m01
  • 9,033
  • 6
  • 32
  • 58