How to tell parallel-python processes in which folder to search for modules?

Question

I am working on a computer cluster, which has NumPy 1.4.1 installed in the usual folder (/usr/lib64/....). As I want to use NumPy 1.7.0, I have installed it /.../myPath, and added export PYTHONPATH=/.../myPath to my .bashrc, such that using import numpy will automatically load NumPy 1.7.0. This works fine, except for a peculiarity when using parallel python. To load the correct NumPy module in each process, I modify sys.path, as those processes seem to ignore the $PYTHONPATH variable:

import pp
import numpy

def try2():
  sys.path.insert(0,'/.../myPath')
  import numpy
  a=numpy.random.rand(4,4)
  return numpy.__version__

print numpy.__version__

job_server = pp.Server(2, ppservers=() )
jobs=[job_server.submit(try2,(),(),("sys",)),job_server.submit(try2,(),(),("sys",))]
for job in jobs:
  print job()

The output is as desired:

1.7.0
1.7.0
1.7.0

However, when I call it with an ndarray argument like this

import pp
import numpy

def try2(a):
  sys.path.insert(0,'/.../myPath')
  import numpy
  return numpy.__version__

print numpy.__version__

a=numpy.random.rand(4,4)
job_server = pp.Server(2, ppservers=() )
jobs=[job_server.submit(try2,(a,),(),("sys",)),job_server.submit(try2,(a,),(),("sys",))]
for job in jobs:
  print job()

the output changes to

1.7.0
1.4.1
1.4.1

My interpretation: the subprocess receives a numpy.ndarray argument as soon as it's called and therefore searches a module named numpy before I get a chance to modify sys.path. Any ideas on how to fix this?

Sorry, I have actually put the full path in my program. I've just abbreviated it with '~' here to save space, but nice to know that '~' doesn't work, anyway. (edited post) — cm_, Mar 06 '13 at 17:55

score 1 · Accepted Answer · answered Mar 06 '13 at 17:55

Given your particular requirement of differentiation from user and sytem-wide modules given a certain package, I would recommend looking at setting up virtualenv to run all your code in an isolated environment (to quote the link):

sudo pip install virtualenv
(or, sudo easy_install virtualenv if you don’t use pip)
(or, easy_install --install-dir ~/site-packages/ virtualenv on a shared host)
mkdir ~/virtualenvs   (a directory for your isolated environments)
virtualenv ~/virtualenvs/mysite.com --no-site-packages
(--no-site-packages isolates your environment from the main site-packages directory)
cd ~/virtualenvs/mysite.com/bin
source activate  (activates your new environment)

This would help save the need for path injection.

This is the workaround I was looking for, much cleaner than modifying sys.path. — cm_, Mar 07 '13 at 11:40

How to tell parallel-python processes in which folder to search for modules?

1 Answers1