0

I was shocked to learn how little tutorials and guides there is to be found on the internet regarding parallel python (PP) and handling classes. I've ran into a problem where I want to initiate a couple of instances of the same class and after that retreive some variables (for instances reading 5 datafiles in parallel, and then retreive their data). Here's a simple piece of code to illustrate my problem:

import pp

class TestClass:
    def __init__(self, i):
        self.i = i

    def doSomething(self):
        print "\nI'm being executed!, i = "+str(self.i)
        self.j = 2*self.i
        print "self.j is supposed to be "+str(self.j)
        return self.i

class parallelClass:
    def __init__(self):
        job_server = pp.Server()
        job_list = []
        self.instances = [] # for storage of the class objects
        for i in xrange(3):
            TC = TestClass(i) # initiate a new instance of the TestClass
            self.instances.append(TC) # store the instance
            job_list.append(job_server.submit(TC.doSomething, (), ())) # add some jobs to the job_list
        results = [job() for job in job_list] # execute order 66...

        print "\nIf all went well there's a nice bunch of objects in here:"
        print self.instances
        print "\nAccessing an object's i works ok, but accessing j does not"
        print "i = "+str(self.instances[2].i)
        print "j = "+str(self.instances[2].j)

if __name__ == '__main__' :
    parallelClass() # initiate the program

I've added comments for your convenience. What am I doing wrong here?

MPA
  • 1,878
  • 2
  • 26
  • 51
  • What's going wrong? What is the expected output of your program and what are you getting instead? By the way: why are you using a class `__init__` when you actually wanted something like a `main` function. I'd never expect that creating an instance of a class will block my entire program. – Bakuriu Feb 10 '13 at 19:21
  • @Bakuriu I'd expect the program to print the value of `j` of one instance. I get an `AttributeError: TestClass instance has no attribute 'j'` instead. Don't worry about `__init__`, this code is just a simplified representation of my actual program. – MPA Feb 10 '13 at 19:37
  • I believe the problem is that the code is executed on different objects. Parallel python simple pickles the objects and sends them to the subprocesses, so you modifies the local state of the objects, which does not affect the original instances. You could check this printing the `id` of the `self` into `doSomething` and the `id` of the elements in `instances`. – Bakuriu Feb 10 '13 at 19:40
  • @Bakuriu you are absolutely right, the instances do not match. Do you have any idea on how to avoid/fix this? – MPA Feb 10 '13 at 19:46
  • parallel python does not provide automatic syncronization between objects, which means you can only communicate via the results of the function. The results of `doSomething` should provide all the information you needed. – Bakuriu Feb 10 '13 at 19:49

1 Answers1

1

You should use callbacks

A callbacks is a function that you pass to the submit call. That function will be called with the result of the job as argument (have a look at the API for more arcane usage).

In your case

Set up a callback:

class TestClass:
    def doSomething(self):
         j = 2 * self.i
         return j # It's REQUIRED that you return j here.

    def set_j(self, j):
        self.j = j

Add the callback to the job submit call

 class parallellClass:
      def __init__(self):
          #your code...
          job_list.append(job_server.submit(TC.doSomething, callback=TC.set_j))

And you're done.

I made some improvements to the code to avoid using self.j in the doSomething call, and only use a local jvariable.

As mentioned in the comments, in pp, you only communicate the result of your job. That's why you have to return this variable, it will be passed to the callback.

Thomas Orozco
  • 53,284
  • 11
  • 113
  • 116