-1

I'm using MRJob on machine A to launch MapReduce jobs on machines B_0 thru B_10. The job has dependencies that require it to be run not with the default /bin/python (i.e. the output of which python on machine A) but with /path/to/weird/python, which exists on the B's but not on A.

How do I tell mrjob to use /bin/python locally to launch the job, but /path/to/weird/python to run it on the B's once it's in the Hadoop cluster?

The --interpreter argument seems to determine the interpreter for both local and Hadoop. Is there another option to specify them individually?

Or is there some reason that the interpreter used must lie at the same path on both machines?

Eli Rose
  • 6,788
  • 8
  • 35
  • 55

1 Answers1

0

Add a shebang line at the top of your Python file(s) to tell the system which Python environment to use.

Using the env tag allows both the local machine and the Hadoop cluster to access the correct interpreter, without you needing to explicitly state the location, e.g.:

!/usr/bin/env python

or

!/usr/bin/env python3

Community
  • 1
  • 1
Matt Doyle
  • 867
  • 8
  • 12
  • Thanks for the response! But I don't think this solves my problem. I want to run it with `/bin/python` on local and `/path/to/weird/python` on the cluster. Are you suggesting that putting `!/path/to/weird/python` in my job files will cause my cluster to run them with that? I thought it runs them as `python job.py` instead of `./job.py`. – Eli Rose Apr 11 '16 at 00:46
  • Hi Eli, is /path/to/weird/python the default Python environment on your Hadoop cluster? If so, having !/usr/bin/env python in the .py file will run it from /path/to/weird/python on your Hadoop cluster, and from /bin/python when launching locally. – Matt Doyle Apr 11 '16 at 00:54
  • It is not, unfortunately. – Eli Rose Apr 11 '16 at 03:27