2

How do I set python optimized mode (-O param for interpreter) on an executor running on a Spark slave?

(Apparently the Python interpreter for the executor is launched using this line

 val pb = new ProcessBuilder(Arrays.asList(pythonExec, "-m", "pyspark.worker")) 

in org/apache/spark/api/python/PythonWorkerFactory.scala.

But I don't see a way of setting the -O flag.)

Joshua Fox
  • 18,704
  • 23
  • 87
  • 147

2 Answers2

4

The Python executable is set by the PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environmental variable (the latter sets it both for the executors and the driver). You could create a wrapper that runs python -O:

#!/bin/sh
exec python -O "$@"

And use this wrapper by setting PYSPARK_PYTHON=/home/daniel/python_opt.sh.

Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
-1

You cannot set -O on the Spark worker processes. This option is mostly useless anyway. (See What is the use of Python's basic optimizations mode? (python -O).)

Community
  • 1
  • 1
Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
  • Sorry about the unhelpful answer, but I think this is the truth. What is your use-case for wanting to set `-O`? Do you see a measurable performance improvement in your code with `-O`? – Daniel Darabos Oct 03 '15 at 21:16
  • My code has lots of assertions, for use in testing. I want to disable them at runtime to avoid crashes. (We use assertions so that, for production mode, when there is an exception in a processing loop, the processing will continue, but in dev it stops.) – Joshua Fox Oct 04 '15 at 04:52
  • 1
    I disagree with this organization of the code (silent corruption is worse than stopping with an error), but you're right that this debate does not belong to this question. I've actually had an idea for a solution, so I've posted a different answer! Hope that's more useful. – Daniel Darabos Oct 04 '15 at 10:13
  • 1
    Thanks for your answer, which is what we need. Yes, I actually agree with you in general. Our code has loops over thousands of input data structures in which a bug causing failure in one item -- for example, unexpected input structurs -- does not have to cause all to fail. – Joshua Fox Oct 14 '15 at 09:02