-1

The following code works if the module "user.py" is in the same directory as the code, but fails if it is in a different directory. The error message I get is "ModuleNotFoundError: No module named 'user'

import multiprocessing as mp
import imp

class test():
    def __init__(self,pool):

        pool.processes=1
        usermodel=imp.load_source('user','D:\\pool\\test\\user.py').userfun
         #file D:\\pool\\test\\user.py looks like this:
         #   def userfun():
         #      return 1

         vec=[]
         for i in range(10):
            vec.append([usermodel,i])

         pool.map(self.myfunc,vec)

    def myfunc(self,A):
        userfun=A[0]
        i=A[1]
        print (i,userfun())
        return

if __name__=='__main__':
    pool=mp.Pool()
    test(pool)

If the function myfunc is called without the pooled process the code is fine regardless of whether user.py is in the same directory of the main code or in \test. Why can't the pooled process find user.py in a separate directory? I have tried different methods such as modifying my path then import user, and importlib, all with the same results.

I am using windows 7 and python 3.6

self.bcl
  • 37
  • 2
  • Try `open('D:\\pool\\test\\user.py', 'r').close()`. Do you get any errors? – smac89 Apr 12 '18 at 22:33
  • First, which version of Python are you using? (And, if it's not 2.7 or 3.3, why are you using `imp`?) – abarnert Apr 12 '18 at 22:33
  • Second, give us the complete traceback, not just the error message. – abarnert Apr 12 '18 at 22:35
  • Anyway, I can't reproduce this problem in 3.6. In 2.7, I have to make various changes before I can even get to this point—starting with the fact that you can't pass `self.myfunc` to `Pool.map` because it can't be pickled. Once I fix all of those, it's basically the same problem as `self.myfunc`—`user.userfun` can't be pickled either. – abarnert Apr 12 '18 at 22:51
  • @smac89 - no errors just trying to open and close the file. – self.bcl Apr 12 '18 at 23:40
  • @ abarnet - using 3.6. I'm using imp because its simple. I have tried other methods and they all give the same error. – self.bcl Apr 12 '18 at 23:42
  • Ah, multiprocessing. One of the leakiest abstractions in the standard library. – user2357112 Apr 12 '18 at 23:43
  • @abarnet - I have now reproduced the problem on two different machines by just copying and pasting the code above, and adding user.py to a separate directory. Thanks for your help. – self.bcl Apr 12 '18 at 23:45

1 Answers1

2

multiprocessing tries to pretend it's just like threading, but the abstraction leaks like a sieve. One of the ways it leaks is that communicating with worker processes involves a lot of implicit pickling and data copying.

When you try to send usermodel to a worker, multiprocessing implicitly pickles it and tries to have the worker unpickle the pickle. Functions are pickled by recording the module name and function name, so the worker just thinks it's supposed to do from user import userfun to access userfun. It doesn't know that user needs to be loaded with imp.load_source from a specific filesystem location, so it can't reconstruct usermodel.

The way this problem manifests is OS-dependent, because if multiprocessing uses the fork start method, the workers inherit the user module from the master process. fork is default on Unix, but unavailable on Windows.

user2357112
  • 260,549
  • 28
  • 431
  • 505