13

I currently have this piece of code (feel free to comment on it too :) )

def threaded_convert_to_png(self):
    paths = self.get_pages()
    pool = Pool()
    result = pool.map(convert_to_png, paths)
    self.image_path = result

On an Intel i7 it spawns eight workers when running on Linux; however, when running Windows 8.1 Pro it only spawns one worker. I checked and cpu_count() returns 8 on both Linux and Windows.

  • Is there something I am missing here, or doing wrong?
  • Is there a way to fix that problem?

P.S. This is in Python 2.7.6

aksu
  • 5,221
  • 5
  • 24
  • 39
Drakkainen
  • 1,142
  • 11
  • 25
  • `Pool.__init__` calls `cpu_count` to get the default number of processes (see [`Lib/multiprocessing/pool.py` at line 146](http://hg.python.org/cpython/file/3a1db0d2747e/Lib/multiprocessing/pool.py#l146)). Also the `__init__` calls `_repopulate_pool` on [line 159](http://hg.python.org/cpython/file/3a1db0d2747e/Lib/multiprocessing/pool.py#l159) which executes a loop on [line 213](http://hg.python.org/cpython/file/3a1db0d2747e/Lib/multiprocessing/pool.py#l213) that spawns the correct number of `Process` instances. Are you sure only one worker is spawn? How are you checking the number of workers? – Bakuriu Feb 21 '14 at 16:31
  • I'm sure because I only see one extra python process (and the conversion takes ages). I even tried passing `Pool(processes=8)`, and again only one worker got spawned. – Drakkainen Feb 21 '14 at 16:33
  • 2
    Try to create a [minimal complete code example that shows your issue](http://stackoverflow.com/help/mcve) e.g., use `def f(path): print path, mp.current_process()` instead of `convert_to_png()` and enable logging `mp.log_to_stderr().setLevel(logging.DEBUG)`. – jfs Feb 24 '14 at 13:16
  • 1
    what is `len(paths)`? – jfs Feb 24 '14 at 13:17
  • 1
    Have you properly enclosed you script in `if __name__ == '__main__':` and is `convert_to_png` properly defined outside of it? (documented here: http://docs.python.org/2/library/multiprocessing.html) – Matt Feb 24 '14 at 14:12
  • @Drakkainen - where do you see the single extra process? Are you sure your sort order (eg. in task manager) isn't obscuring the others? – detly Feb 25 '14 at 06:10
  • @detly I use get-process from PowerShell. I'm doing some logging, I'll follow up on the other comments. – Drakkainen Feb 26 '14 at 18:31
  • Fwiw, I'm having similar problem. On Linux and OSX my code spawns multiple processes, but on windows processes seem to be spawn sequentially. What's weird is that it worked yesterday like expected. I'll investigate it and report if I find anything. – Tumetsu Feb 27 '14 at 19:31
  • can you show more code? Show as mcuh code as you need to reproduce the spawning of one worker. What is paths? – User Mar 02 '14 at 18:04
  • @Drakkainen did you recognize what was the problem? – Alex Pertsev Mar 04 '14 at 08:17
  • @A.Haaji The underlying library was failing on Windows, thanks to logging I could see it. It would start the thread and just sit there... so when i switched the library, it worked fine. – Drakkainen Mar 06 '14 at 23:42
  • @Drakkainen, oh, i've got it, nice to hear it. – Alex Pertsev Mar 07 '14 at 08:47

2 Answers2

1

There is one easy way to determine what is happends in your pool - to turn on multiprocessing debug. You can do it like this:

import logging
from multiprocessing import util

util.log_to_stderr(level=logging.DEBUG)

And on script running you will get full info about processes running, spawning and exiting.

But any way, process pool always spawn N processes (where is N - "processes" argument value or cpu_count), but tasks distribution between processes can be uneven - it depends on task run time.

Alex Pertsev
  • 931
  • 4
  • 13
1

I managed to solve my similar problem. I'm not sure if it's help for you but I decided to document it here anyway in case it helps someone.

In my case I was analyzing huge amount of tweets (52000 in total) by dividing them to multiple processors. It worked fine on OSX and on server, but on my Windows 8.1 it was really slow and processes were activated sequentially. By looking into task-manager I noticed that the main Python process' memory usage went up and up to around 1.5Gb. The worker process' memory usage climbed similarly. Now I noticed that my older version worked fine which had slightly different algorithm. In the end the problem was that I retrieved whole tweets from database while I required only the text part of the tweets. This apparently led to grown memory usage. After I fixed that part, the program launched worker processes properly.

So based on my experience I have a hunch that Windows tries to control the ram usage by blocking the worker processes. If so, check the ram usage of your processes. This is just speculation on my part, so I'm interested if someone has better explanation.

Tumetsu
  • 1,661
  • 2
  • 17
  • 29