0

In the following code which is an example of my main code, I have tried to use pathos.multiprocessing to increase the speed of iteration of a loop. The output of each iteration which has implemented with multiprocessing is a 2-D array. I used pathos.multiprocessing instead of multiprocessing since I wanted to use it in my class method. I have used apipe method of the pathos.multiprocessing to collect the output in a list but it returns an empty list. I have no idea why it fails

import numpy as np
import random
import pathos.multiprocessing as mp
class Testsystematics(object):
      def __init__(self, x, y, NTH = None, THMIN = None, THMAX = None, NRESAMPLE = None):
         self.x        = x
         self.y        = y
         self.nbins    = NTH
         self.bmin     = THMIN
         self.bmax     = THMAX
         self.nresample= NRESAMPLE
         self.bins     = np.linspace(self.bmin, self.bmax, self.nbins+1, True).astype(np.float)
         self.sample   = np.array([[random.choice(range(len(self.y))) for _ in xrange(len(self.y))] for i in range(self.nresample)])
         self.result_list=[]
      def log_result(self, result):
          self.result_list.append(result)
      def bootstrapping(self, k):
          xi_p     = np.zeros(self.nbins, float)
          xi_m     = np.zeros(self.nbins, float)
          nind     = np.zeros(self.nbins, float)
          for i in range(len(self.x)):
              for j in range(len(self.x)):
                  if (i!=j): 
                     sep= np.sqrt(self.x[i]**2+self.x[j]**2)
                     index= np.searchsorted(self.bins, sep , side='right')-1 
                     sind = np.sin(sep)
                     if ((sep< self.bins[-1]) and (sep>=self.bins[0])):
                        xi_p[index] += sind*(np.mean(y)-np.median(y))
                        xi_m[index] += sind*np.std(y)
                        nind[index] += 1.0
          for i in range(self.nbins):
              xi_p[i]=xi_p[i]/nind[i]
              xi_m[i]=xi_m[i]/nind[i]
          return np.vstack((xi_p,xi_m))
      def twopcf(self):   
         if (self.sys_type==1):
            pool = mp.ProcessingPool(16)
            for n in range(self.nresample):
                pool.apipe(self.bootstrapping, args=(n,), callback=self.log_result)

shape,scale=0.5, 0.6
x=np.random.gamma(shape, scale, 10000)
mu1, sigma1 = 0, 0.5 # mean and standard deviation
mu2, sigma2 = 0.1, 0.7 # mean and standard deviation

y = np.random.normal(mu1, sigma1, 1000)+np.random.normal(mu2, sigma2, 1000)
sysTest=Testsystematics(x, y, NTH = 10, THMIN = 0, THMAX = 5, NRESAMPLE = 100)

any suggestion?

Dalek
  • 4,168
  • 11
  • 48
  • 100

1 Answers1

0

I'm the pathos author. I tried your code, and it runs, but produces no error and produces no result in result_list. I believe that is because you are using apipe incorrectly. The correct use of apipe is as follows:

>>> import pathos
>>> def squared(x):
...   return x**2
... 
>>> pool = pathos.multiprocessing.ProcessingPool()
>>> res = pool.apipe(squared, 5)
>>> res.get()
25

self.bootstrapping takes self and k, so you have to provide a k in the pipe call when you calling it as an instance method. There is no callback -- if you want a callback, you'd need to add one to your function.

Note that the return value is retrieved by (1) getting a return object, and (2) by calling get on the return object.

From you use of apipe within a for loop, that points me to suggest you use pool.amap (or pool.imap) instead -- then you can do the for loop in parallel.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Do you think returning a 2-D array each time in a loop would be a problem for multiprocessing? – Dalek Jul 28 '15 at 18:46
  • Not unless it's huge. It also depends on how it's pickled… so multiprocessing will do it inefficiently by default, while numpy knows innately how to make a smaller pickle. Remember, multiprocessing is copying these objects… so you have speed and memory considerations. – Mike McKerns Jul 28 '15 at 19:21
  • Well, your comment about using `get` function was helpful. However when I use still `apipe` and look whether it uses all the core, the code is distributed between cores but except one that use the full power cpu the rest don't use any cpu power. Basically it still works on one core. – Dalek Jul 28 '15 at 21:46
  • 1
    Absolutely. `apipe` is once core only… it's "a pipe". What you want is `amap`, which is asynchronous parallel batch. – Mike McKerns Jul 28 '15 at 23:27