-1

I've learned on how to setup pyspark on multi-node with much of googling. And now i've created my first pyspark code with only creating a dataframe and printing its data inside of it.

But now I wanted to run and execute my pyspark code, i named it "firstcode.py". I've tried to search but didn't get a clue how to do that. Like how to run it, and where should i execute the script to run my "firstcode.py" files?

1 Answers1

0

You can submit your code using spark submit on any host in your cluster. Resource usage, additional libraries vs can be configured. master url can be vary as yarn, local,standalone, kubernetes etc. You can see submitting application for details.

Sample script for master as yarn:

spark-submit \
 --master yarn \
 --deploy-mode cluster \
 --executor-memory 1g\
 --num-executors 2\
 myCode.py

Sample script for standalone spark

spark-submit \
--master spark://host_ip:7077 \
--deploy-mode cluster \
--executor-memory 1g \
myCode.py
ozlemg
  • 436
  • 2
  • 10