I have a python package with many modules built into an .egg file and I want to use this inside zeppelin notebook. Acc to the zeppelin documentation, to pass this package to zeppelin spark interpreter, you can export it through --files option in SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh. I have the following questions regarding this:
In pyspark shell, .egg file given with --py-files is working (i.e I am able to import the module inside the package inside the pyspark shell), while the same .egg file with --files option is not working (ImportError: No module named XX.xx)
Adding the .egg file via --py-files option in SPARK_SUBMIT_OPTIONS in zeppelin causes error:
Error: --py-files given but primary resource is not a Python script.
As per my understanding, whatever being given in SPARK_SUBMIT_OPTIONS is passed to spark-submit command, but why is --py-files throwing error?When I add the .egg through the --files option in SPARK_SUBMIT_OPTIONS , zeppelin notebook is not throwing error, but I am not able to import the module inside the zeppelin notebook.
What's the correct way to pass an .egg file zeppelin spark intrepreter?
Spark version is 1.6.2 and zeppelin version is 0.6.0
The zepplein-env.sh file contains the follwing:
export SPARK_HOME=/home/me/spark-1.6.1-bin-hadoop2.6
export SPARK_SUBMIT_OPTIONS="--jars /home/me/spark-csv-1.5.0-s_2.10.jar,/home/me/commons-csv-1.4.jar --files /home/me/models/Churn-zeppelin/package/build/dist/fly_libs-1.1-py2.7.egg"