4

I am using this code to scrape an API:

submissions = get_submissions(1)
with futures.ProcessPoolExecutor(max_workers=4) as executor:
#or using this: with futures.ThreadPoolExecutor(max_workers=4) as executor:
    for s in executor.map(map_func, submissions):
        collection_front.update({"time_recorded":time_recorded}, {'$push':{"thread_list":s}}, upsert=True)

It works great/fast with threads but when I try to use processes I get a full queue and this error:

  File "/usr/local/lib/python3.4/dist-packages/praw/objects.py", line 82, in __getattr__
    if not self.has_fetched:
RuntimeError: maximum recursion depth exceeded
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.4/concurrent/futures/process.py", line 251, in _queue_management_worker
    shutdown_worker()
  File "/usr/lib/python3.4/concurrent/futures/process.py", line 209, in shutdown_worker
    call_queue.put_nowait(None)
  File "/usr/lib/python3.4/multiprocessing/queues.py", line 131, in put_nowait
    return self.put(obj, False)
  File "/usr/lib/python3.4/multiprocessing/queues.py", line 82, in put
    raise Full
queue.Full

Traceback (most recent call last):
  File "reddit_proceses.py", line 64, in <module>
    for s in executor.map(map_func, submissions):
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 549, in result_iterator
    yield future.result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 402, in result
    return self.__get_result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 354, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Note that originally the processes worked great and very fast for small data retrievals, but now they're not working at all. Is this a bug or what's going on that the PRAW object would cause a recursion error with Processes but not with Threads?

sunny
  • 3,853
  • 5
  • 32
  • 62
  • The queue that it's complaining about is actually an internal queue called `call_queue`. It's defined with a max size: `self._call_queue = multiprocessing.Queue(self._max_workers + EXTRA_QUEUED_CALLS)`. It in the process of shutting down the `Executor` when that error occurs. It looks like you may have cut off the top of the first traceback. Can you include that? I've run into that `queue.Full` error before, but it was because I was hacking on `concurrent.futures.ProcessPoolExecutor` and changing its shutdown behavior/timing. I'm wondering if you may have hit a rare bug. – dano Jun 12 '15 at 15:42
  • @dano I have attached the fuller error above in the original question – sunny Jun 12 '15 at 15:51
  • Hmm, This looks like a possible trigger: `RuntimeError: maximum recursion depth exceeded`. That seems to be coming from `/usr/local/lib/python3.4/dist-packages/praw/objects.py`. Any thoughts on that? – dano Jun 12 '15 at 16:08
  • @dano ugh now it's giving me this recursion error even now that I've gone back to retrieving small amounts of information. code hasn't changed at all.....and I don't get this recursion error when I use ThreadPoolExecutor instead of ProcessPoolExecutor. – sunny Jun 12 '15 at 16:20

1 Answers1

2

I had a similar problem moving from threads to processes only I was using executor.submit. I think this might be the same problem you have, but I can't be sure because I don't know in what context your code is running.

In my case what happened was: I was running my code as a script, and I didn't use the always recommended if __name__ == "__main__":. It looks like when running a new process with the executor, python loads the .py file and runs the function specified in submit. Because it loads the file, the code that exists on the main file (not inside functions or the above if sentence) gets ran, so each process would run again a new process, having an infinite recursion.

It looks like this doesn't happen with threads.

dyeray
  • 1,056
  • 6
  • 17
  • If I get this right, I had a similar problem, too. In Windows, anything using `multiprocessing` needs to have `multiprocessing.freeze_support()` before any code that shouldn't be run before the worker function in subprocesses. That's because, due to no `fork()` in Windows, subprocesses are executed by running the same command line with additional arguments. `freeze_support()` looks for these arguments and runs the `multiprocessing` framework instead of the following code if they're present. – ivan_pozdeev Oct 01 '15 at 22:01
  • In my case `multiprocessing.freeze_support()` doesn't make a difference. May be related, but it doesn't look like the same problem. The Python documentation says freeze_support only has effect in scripts frozen to an exe, created with py2exe or similar. I am on Windows too, but I don't know if the behavior on other os would be the same or not. – dyeray Oct 01 '15 at 22:49
  • "The Python documentation says freeze_support only has effect in scripts frozen to an exe" - yes, it does, but this is not so (at least, it wasn't when I last dealt with it). – ivan_pozdeev Oct 01 '15 at 23:44