5

I am submitting PySpark jobs to the cluster through Livy. Currently the dependent python packages like NumPy, Pandas, Keras etc are installed on all the datanodes. Was wondering if all of these packages can be stored centrally in HDFS and how can you configure Livy, PySpark to read these from HDFS instead of from that datanode.

zero323
  • 322,348
  • 103
  • 959
  • 935

0 Answers0