I came across with an interesting question that, the different methods of submitting the spark application from windows development environment. Generally, we submit spark job using spark-submit
and also we can execute uber jar (dependent spark libraries assembled with jar) using java -jar
- Command using java -jar:
java -jar -Xmx1024m /home/myuser/myhar.jar
- Command using spark-submit:
spark-submit --master local[*] /home/myuser/myhar.jar
Since, I can execute the job using both method, I observed that sometimes java -jar
method is faster and sometimes spark-submit
is faster for same data-set (say 20000 rows with lots of data shuffling login inside).spark-submit
has better option to control executors and memory etc using command line argument, however java -jar
, method we need to hard-code inside the code itself. If we run the jar with large data-set java -jar
is throwing out of memory exception while spark-submit
is though taking time but executing without error with default configurations.
I couldn't understand the difference by submitting application using spark-submit
and java-jar
hence my questions is:
How the execution happen, when we submit application using java-jar
. Does it execute inside the jvm
memory itself and not using any spark properties?