5

I am trying to do threaded convolution using PyFFTW, in order to calculate a large number of 2D convolutions simultaneously. (One does not need separate processes, since the GIL is released for Numpy operations). Now here is the canonical model for doing so: http://code.activestate.com/recipes/577187-python-thread-pool/

(Py)FFTW is so fast because it reuses plans. These have to be setup separately for each thread in order to avoid access violation errors, like this:

class Worker(Thread):
    """Thread executing tasks from a given tasks queue"""
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True

        # Make separate fftw plans for each thread.
        flag_for_fftw='patient'      
        self.inputa = np.zeros(someshape, dtype='float32')
        self.outputa = np.zeros(someshape_semi, dtype='complex64')

        # create a forward plan.
        self.fft = fftw3.Plan(self.inputa,self.outputa, direction='forward', flags=[flag_for_fftw],nthreads=1)         

        # Initialize the arrays for the inverse fft.
        self.inputb = np.zeros(someshape_semi, dtype='complex64')
        self.outputb = np.zeros(someshape, dtype='float32')

        # Create the backward plan.
        self.ifft = fftw3.Plan(self.inputb,self.outputb, direction='backward', flags=[flag_for_fftw],nthreads=1)               
        self.start() 

In this way one can pass the arguments self.inputa, self.outputa, self.fft, self.inputb, self.outputb, self.ifft to the actual convolver within the run method in the Worker class.

This is all nice, but we might as well import the ThreadPool class:

from multiprocessing.pool import ThreadPool

But how should I define the initializer in ThreadPool to get the same result? According to the docs http://docs.python.org/library/multiprocessing.html "each worker process will call initializer(*initargs) when it starts". You can easily check this in the Python source code.

However, when you set up the Threadpool, for example with 2 threads:

po = ThreadPool(2,initializer=tobedetermined)

and you run it, perhaps in some loop

po.apply_async(convolver,(some_input,))

how can you make convolver be setup by initializer? How can you make it use separate FFTW plans in each thread, without recomputing the FFTW plan for every convolution?

Cheers, Alex.

Xtroce
  • 1,749
  • 3
  • 19
  • 43
Alex van Houten
  • 308
  • 3
  • 17

1 Answers1

1

you can wrap the convolver call with a function that uses Thread Local Storage (threading.local()) to initialize PyFFTW and remember the result

Mihai Stan
  • 1,052
  • 6
  • 7