1

I'm working with a program that works in parallel execution with dispy. I'm using dispy to create tasks and then distribute it to different CPUs to execution.

I have standar libraries and developed by me libraries (data and connection).

The code is like this:

import dispy
import sys
import data
import connection

def compute(num):
    #some code that call data and connection methods, and generate a solution
    return solution

def main():
    cluster = dispy.JobCluster(compute)
    jobs = []

    for i in range(10)
        job = cluster.submit(i)
        job.id = i # optionally associate an ID to job (if needed later)
        jobs.append(job)

    for job in jobs:
        job()
        print "Result = " + str(job.result)
        print "Exception = " + str(job.exception)

if __name__ == "__main__":
    main() 

`

The problem is that I need if a work with data and connection in the main def it works all fine, also if I call compute as a function instead of using the dispy library. But when I work like that and in the compute procedure call a data function it throws and exception that data is not defined and print exception None.

Any help? The documentation suggests of use setup but I can't figure out how it works.

A77ak
  • 35
  • 9
  • can you add the stacktrace for the exceptions you are getting? – saq7 May 29 '16 at 15:45
  • `Exception = Traceback (most recent call last): File "dispynode.py", line 186, in _dispy_job_func __dispy_job_name) in globals() File "", line 1, in File "", line 47, in compute NameError: global name 'data' is not defined` – A77ak May 29 '16 at 16:30

3 Answers3

0

Put the import data call inside the compute function.

Dispy ships the function to call along with its arguments to the new process. The new process doesn't have data imported. That's why adding import data inside the function definition should fix this.

saq7
  • 1,528
  • 1
  • 12
  • 25
  • If I do that, the second who is imported returns an error. ImportError: no module named connection – A77ak May 29 '16 at 16:51
  • then you need to add `import connection` to the compute function. The idea is that all modules needed to run compute, must be imported inside compute – saq7 May 29 '16 at 16:52
  • It is in the compute function – A77ak May 29 '16 at 17:29
  • In the sample code, the imports are NOT in the compute function. They're in the same file as the compute function. Have you revised this? – Chris Johnson Jun 14 '16 at 21:47
0
JobCluster(compute, depends=[data])

Specify that the comoute function depends on whichever modules you need.

ddm-j
  • 403
  • 1
  • 3
  • 18
0

If it is a module that you know that all machines have it installed, you can just import data,connections inside the compute function.

I know it is not elegant but is working for me and there are 2 options:

get rid of main function and put it in the if main block, because it is likely to be executed when function gets in cluster. define all your module data inside one big function and pass it to the cluster, this is very simple way and yet powerfull.

import dispy
import sys


def compute(num):
    def data_func1(json_):
        #do something to json_
        return json_
    def data_func2(json_):
        #do something diff
        return json_
    #some code that call data and connection methods, and generate a solution
    return solution

if __name__ == "__main__":
    cluster = dispy.JobCluster(compute)
    jobs = []

    for i in range(10)
        job = cluster.submit(i)
        job.id = i # optionally associate an ID to job (if needed later)
        jobs.append(job)

    for job in jobs:
        job()
        print "Result = " + str(job.result)
        print "Exception = " + str(job.exception)

or define all your functions in script and pass all of then as depends at job cluster creation time like

import dispy
import sys

def data_func1(json_):
    #do something to json_
    return json_
def data_func2(json_):
    #do something diff
    return json_

class DataClass:
    pass

def compute(num):

    #some code that call data and connection methods, and generate a solution
    return solution

if __name__ == "__main__":
    cluster = dispy.JobCluster(compute, depends=[data_func1,
                                                 data_func2,
                                                 DataClass])
    jobs = []

    for i in range(10)
        job = cluster.submit(i)
        job.id = i # optionally associate an ID to job (if needed later)
        jobs.append(job)

    for job in jobs:
        job()
        print "Result = " + str(job.result)
        print "Exception = " + str(job.exception)