0

I'm running Spark 2.2.0 on yarn, trying to submit python file backtest.py with all the project files zipped to prediction.zip. See below the spark submit command.

The problem is that Spark cant find one of my modules. What am I missing?

HADOOP_CONF_DIR="/etc/hive/conf.cloudera.hive" \
SPARK_HOME="/opt/spark/spark-2.2.0-bin-hadoop2.7" \
PYSPARK_PYTHON="/opt/anaconda/bin/python" \
PYSPARK_DRIVER_PYTHON="/opt/anaconda/bin/python" \
sudo -u hdfs \
/opt/spark/spark-2.2.0-bin-hadoop2.7/bin/spark-submit \
--master yarn \
--conf "spark.sql.shuffle.partitions=2001" \
--conf "spark.executorEnv.PYTHONHASHSEED=0" \
--deploy-mode cluster \
--master yarn \
--py-files /home/gals/prediction.zip \
/home/gals/parent/prediction/backtesting/backtest.py
Drise
  • 4,310
  • 5
  • 41
  • 66
Gal Shaboodi
  • 744
  • 1
  • 7
  • 25
  • Please let me know if you want some more info... – Gal Shaboodi Mar 12 '18 at 15:23
  • Could you please take a look at [this](https://stackoverflow.com/questions/47157793/spark-runs-in-local-but-cant-find-file-when-running-in-yarn/47159165#47159165) ? In general, make sure your zip and your python file is hosted somewhere accessible to all nodes, not on your machine. – mkaran Mar 12 '18 at 15:30

0 Answers0