7

I'm a multiprocessing newbie,

I know something about threading but I need to increase the speed of this calculation, hopefully with multiprocessing:

Example Description: sends string to a thread, alters string + benchmark test, send result back for printing.

from threading import Thread

class Alter(Thread):
    def __init__(self, word):
        Thread.__init__(self)
        self.word = word
        self.word2 = ''

    def run(self):
        # Alter string + test processing speed
        for i in range(80000):
            self.word2 = self.word2 + self.word

# Send a string to be altered
thread1 = Alter('foo')
thread2 = Alter('bar')
thread1.start()
thread2.start()

#wait for both to finish
while thread1.is_alive() == True: pass
while thread2.is_alive() == True: pass


print(thread1.word2)
print(thread2.word2)

This is currently takes about 6 seconds and I need it to go faster.
I have been looking into multiprocessing and cannot find something equivalent to the above code. I think what I am after is pooling but examples I have found have been hard to understand. I would like to take advantage of all cores (8 cores) multiprocessing.cpu_count() but I really just have scraps of useful information on multiprocessing and not enough to duplicate the above code. If anyone can point me in the right direction or better yet, provide an example that would be greatly appreciated. Python 3 please

Rhys
  • 4,926
  • 14
  • 41
  • 64
  • 2
    don't busy-wait for thread to complete. use Thread.join()! – soulcheck Jan 08 '12 at 03:08
  • why not? I have done this in most of my coding and if you can provide a good reason, i will change it all :) – Rhys Jan 08 '12 at 03:32
  • 1
    well it's at least as good as busy-waiting and probably makes it passively wait until the thread is terminated without eating the cpu (although i can't find it in the docs i'd wager cpython doesn't busy-wait in it's Thread.join()). – soulcheck Jan 08 '12 at 03:57
  • **this probably depends on platform anyway. – soulcheck Jan 08 '12 at 04:08
  • ok that makes sense, I think I'll use this join() moving forward – Rhys Jan 08 '12 at 04:16
  • 1
    @Rhys: Busy-waiting like that will hold the GIL, and prevent your worker threads from running. Also, if you don't have enough CPUs, then a CPU that could be in your worker thread doing useful work, handling an OS interrupt, or dealing with some other process is instead generating heat. – Thanatos Jan 08 '12 at 07:52

1 Answers1

10

Just replace threading with multiprocessing and Thread with Process. Threads in Pyton are (almost) never used to gain performance because of the big bad GIL! I explained it in an another SO-post with some links to documentation and a great talk about threading in python.

But the multiprocessing module is intentionally very similar to the threading module. You can almost use it as an drop-in replacement!

The multiprocessing module doesn't AFAIK offer a functionality to enforce the use of a specific amount of cores. It relies on the OS-implementation. You could use the Pool object and limit the worker-onjects to the core-count. Or you could look for an other MPI library like pypar. Under Linux you could use a pipe under the shell to start multiple instances on different cores

Community
  • 1
  • 1
Don Question
  • 11,227
  • 5
  • 36
  • 54
  • A good read on how Python handles multiprocessing vs threading on multicore is [here](http://www.martinkral.sk/blog/2011/03/python-multicore-vs-threading-example/) – Fuzzy Analysis Jan 08 '12 at 03:07
  • @Don, Yes! it seems to work. I must just check how much faster its running. one thing though, the code above does not specify number of cores used ... would this be easy to include? – Rhys Jan 08 '12 at 03:16
  • Hi, @Don, what you mean by **big bad GIL**? I'm a python newbie. – WoooHaaaa Sep 22 '12 at 01:26
  • @mrroy: GIL = global interpreter lock; basically you have "real" threads (hardware/OS supported) bur you don't gain performance, but may even loose some. threading in cpython (2.x) is meant for concurrent I/O operations. – Don Question Sep 23 '12 at 18:29