Module not found when trying to submit a python project

Asked Mar 12 '18 at 15:23

Active Mar 12 '18 at 15:32

Viewed 1,179 times

I'm running Spark 2.2.0 on yarn, trying to submit python file backtest.py with all the project files zipped to prediction.zip. See below the spark submit command.

The problem is that Spark cant find one of my modules. What am I missing?

HADOOP_CONF_DIR="/etc/hive/conf.cloudera.hive" \
SPARK_HOME="/opt/spark/spark-2.2.0-bin-hadoop2.7" \
PYSPARK_PYTHON="/opt/anaconda/bin/python" \
PYSPARK_DRIVER_PYTHON="/opt/anaconda/bin/python" \
sudo -u hdfs \
/opt/spark/spark-2.2.0-bin-hadoop2.7/bin/spark-submit \
--master yarn \
--conf "spark.sql.shuffle.partitions=2001" \
--conf "spark.executorEnv.PYTHONHASHSEED=0" \
--deploy-mode cluster \
--master yarn \
--py-files /home/gals/prediction.zip \
/home/gals/parent/prediction/backtesting/backtest.py

edited Mar 12 '18 at 15:32

Drise

4,310
5
41
66

asked Mar 12 '18 at 15:23

Gal Shaboodi

Please let me know if you want some more info... – Gal Shaboodi Mar 12 '18 at 15:23
Could you please take a look at [this](https://stackoverflow.com/questions/47157793/spark-runs-in-local-but-cant-find-file-when-running-in-yarn/47159165#47159165) ? In general, make sure your zip and your python file is hosted somewhere accessible to all nodes, not on your machine. – mkaran Mar 12 '18 at 15:30

Module not found when trying to submit a python project

0 Answers0