I am submitting PySpark jobs to the cluster through Livy. Currently the dependent python packages like NumPy, Pandas, Keras etc are installed on all the datanodes. Was wondering if all of these packages can be stored centrally in HDFS and how can you configure Livy, PySpark to read these from HDFS instead of from that datanode.
Asked
Active
Viewed 222 times
5
-
Did you find a solution to your question? – Bleser Jun 10 '19 at 07:01