1

I'm very new to python, so there could be multiple things wrong with my code. But I can't debug because I literally get no output, including no error. Would love some help. I'm in python 3.5. Cheers!

print('starting')
def simple_function(comic_start, comic_end):  
    for urlNumber in range(comic_start, comic_end):
        print('Downloading page http://xkcd.com/%s...' % (urlNumber))       
        driver = selenium.webdriver.PhantomJS()
        driver.get('http://xkcd.com/%s' % (urlNumber))
        print('finding page')
        form= driver.title
        print(driver.title)   
    driver.quit()
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    executor.map(simple_function, (1, 100))

Output:

runfile('/Users/Seansmac/Desktop/Python/my_classroom/threading_training/concurrent.f1.py', wdir='/Users/Seansmac/Desktop/Python')
starting

EDIT: This is a similar Q to this, Concurrent.futures usage guide - a simple example of using both threading and processing, but like the OP said, the answer there is too complicated!

Community
  • 1
  • 1
SeánMcK
  • 392
  • 3
  • 17

1 Answers1

0

You were mapping a tuple with two values to the function. On each mapping it was sending a single argument, but your function takes two arguements comic_start and comic_end

To see this, you could take out your other arguement and do:

def simple_function(first_arg):
    print(first_arg)

with cf.ThreadPoolExecutor(max_workers=10) as executor:
     executor.map(simple_function, (1, 100))

Result:

1
100

Modified Code:

import concurrent.futures as cf

def simple_function(_range):

    #fill with driver stuff etc
    for num in _range:
        print('Downloading page http://xkcd.com/%s...' % (num))       

_max = 200 #max range you want
workers = 10
with cf.ThreadPoolExecutor(max_workers=workers) as e:

    cs= _max // workers #chunksize to split across threads

    #we chunk the range to be split across the threads so you can iterate
    #over the chunked range in the mapped function.

    #x + cs if x + cs <= _max else _max next upper bound by chunksize but 
    #we don't want to exceed the maximum range you want, so we default to _max
    #if the next upper bound exceeds this

    #x - 1 if x!= 0 else x, start initial range at 0, and last upper
    #bound is exclusive so x - 1 if were not at the initial range.
    ranges = [range(x - 1 if x != 0 else x, x + cs if x + cs <= _max else _max) 
              for x in range(0, _max, cs)]

    e.map(simple_function, ranges)
Pythonista
  • 11,377
  • 2
  • 31
  • 50
  • This is weird: I literally copy and pasted your code into my IDLE (Spyder; not sure if my terminology is correct) and ran the code, just to see what errors I'd start with. But I got the exact same output as I was having before(i.e. nothing). I restarted the whole programme, but no success. – SeánMcK Apr 21 '16 at 15:08
  • try it now I made an error I didn't realize when I was editing the answer I accidentally took out the `0` in the `range`. You need this because it's specifiying `range(start, end, step size)` here – Pythonista Apr 21 '16 at 15:10
  • 1
    Cool, that seems to have done the trick alright. In terms of the chunks- I'm not familiar with the concept. Are you essentially saying if the range is 200, we're giving each thread it's own range ('chunk') to work on. So if there are 10 workers (= threads?) then each worker runs the function on it's own distinct range. So thread_1 works on range 0-9; thread_2 works on range 10-19 etc.? – SeánMcK Apr 21 '16 at 15:16
  • Do you mind if I wait until I get it working before marking it answered? (also how do you mark a Q answered?...I'm new here) – SeánMcK Apr 21 '16 at 15:20
  • Right, you have 10 workers (or whatever number specified) so we chunk these (here I chunked them by splitting the chunksize to the closest integer of 200 / 10) which is a chunksize of 20 and then split them into distinct ranges to cover the full range where each distinct range is mapped to your function on a worker thread. You can do `print(ranges)` to see the ranges if interested. And sure, you should only mark as accepted if it helped and the answer solves the problem :) As for how to accept, there's a checkmark next to answers that you can click. – Pythonista Apr 21 '16 at 15:23
  • I'm still stuck. All I can get is the ranges to print out. Here's my code http://pastebin.com/Ehm9nsC2 (don't know how to paste full code outside of a question/answer) – SeánMcK Apr 21 '16 at 15:48
  • I'm still working on this! Here's my code, I think I did everything you mentioned, but I'm still just getting the ranges to print out. Any thoughts? Thanks http://pastebin.com/4rm86bDA – SeánMcK Apr 22 '16 at 11:31
  • Apologies if you felt I was being too demanding but I'm learning as I go along and this stuff is bloody hard. I've done some basic tutorials, but there also comes a time when you just need to plunge in and try get things done- the balance can be hard to get right. In saying all that, I was able to figure it out. Many thanks for your help. – SeánMcK Apr 22 '16 at 12:53