0

On the /user/usr1/ path in HDFS, I placed two scripts pySparkScript.py and relatedModule.py. relatedModule.py is a python module which will be imported into pySparkScript.py.

I can run the scripts with spark-submit pySparkScript.py

However, I need to run these scripts through Livy. Normally, I run single scripts successfully as the following:

curl -H "Content-Type:application/json" -X POST -d '{"file": "/user/usr1/pySparkScript.py"}' livyNodeAddress/batches

However, when I run the above code, as soon as it gets to import relatedModule.py it fails. I realize I should give the path to the relatedModule also in the parameters of Livy. I tried the following option:

curl -H "Content-Type:application/json" -X POST -d '{"file": "/user/usr1/pySparkScript.py", "files": ["/user/usr1/relatedModule.py"]}' livyNodeAddress/batches

How should I pass both files to Livy?

donjuedo
  • 2,475
  • 18
  • 28
Paam
  • 141
  • 10

1 Answers1

1

Try to use pyFiles property. Please refer Livy REST API docs.