1

I would like to take an existing function (from scikit-learn for example: specifically the "predict" function), and apply it using multiple cores to some dataset.

My first naive approach:

def parallel_predict(classifier):
    @dview.parallel(block=True)
    def predict( matrix ):
        return classifier.predict(matrix)
    return predict

Doesn't work (multiple cores don't start spinning up). Is there a way to make this work?

Or some way to have "non-iterable" functions passed to a @dview.parallel function?

Andrew Spott
  • 3,457
  • 8
  • 33
  • 59
  • Are you trying to parallelise a single call to predict()? The simple methods of parallelisation are all basically ways to farm out multiple calls to a function so that they can run on different cores or different machines. Turning a serial function into a parallel one is usually more involved. – Thomas K May 29 '15 at 22:15
  • Yes, I'm trying to parallelize a single call to predict(). @dview.parallel decorates a function so that when run on an iterable, it splits the iterable up and sends each of them to a different client: which is what I want to do here. Unfortunately, it seems to require that the arguments are ALL iterable. – Andrew Spott May 29 '15 at 22:23
  • Can you wrap the function call in a lambda that only exposes the iterable arguments you want to parallelise over? – Thomas K May 29 '15 at 23:07

1 Answers1

0

Couple of thoughts, both based on the remote execution doc. I'm used to the @remote decorator and not the @parallel one you've used, but hopefully they'll still apply to your case. (can't seem to get that doc to load today, for some reason).

Is it the case that the remote execution is not working because the classifier module is not accessible on the engine? If so, this could be solved by adding an import statement to your decorated function explicitly, by using with dview.import_sync(): import classifier (as per this example), or adding an @require('classifier'): decorator (from that same section of the doc). As far as the last option, not sure how multiple decorators interact (probably easiest to just give it a whack).

The second thought is that you could check for the remote exception(s) (here's the doc on that). This would be a lot more explicit than just getting nothing back. For example, something like:

x = e0.execute('1/0')
print x.metadata['error']

x = predict
print x.metadata['error']
Roland
  • 499
  • 6
  • 16