0

I am creating a Spark application with AWS EMR but spark-submit runs with Python 3 instead of Python 2. But when I run pyspark instead, it is Python 2.

How can I force spark-submit to use Python 2?

I tried to do

export PYSPARK_PYTHON=/usr/bin/python2 

but it didn't work.

Thanks

Pierre
  • 938
  • 1
  • 15
  • 37

2 Answers2

1

Have you tried to insert the

PYSPARK_PYTHON=/usr/bin/python2 

statement into the spark-env.sh file?

Gus B
  • 106
  • 7
  • Do you mean I should do : export PYSPARK_PYTHON=/usr/bin/python2 before running the script? I tried to SSH to the cluster and run manually spark-submit code.py and it seems to run with Python 2. But when I do it with --steps spark-submit ... it runs Python 3. – Pierre Jul 10 '17 at 11:12
  • Hi, I mean whether you have added the PYSPARK_PYTHON environment variable into the $SPARK_HOME/conf/spark-env.sh file at your cluster nodes. The $SPARK_HOME is the directory where you installed Spark. – Gus B Jul 10 '17 at 12:00
  • I just tried that and it still doesn't work. So basically when I call spark-submit from SSH it runs with Python2 but when I add a step 'spark-submit' with AWS console (or cli) it runs Python3. – Pierre Jul 10 '17 at 13:46
  • actually when I run print(sys.version_info) via spark-submit (adding a step with AWS console) it says that it is Python 2.6.9, but there is "SyntaxError: invalid syntax" if I try to run 'print "hello world"' – Pierre Jul 10 '17 at 14:19
0

Actually I had this in my code

from __future__ import print_function

and when I was running print 'hello world' it was crashing because it's not the default print function. But I thought it was crashing because it was using Python 3 instead of Python 2.

Pierre
  • 938
  • 1
  • 15
  • 37