Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
5
votes
2 answers

AWS EMR add step: How to add multiple jars from s3 in --jars and --driver-class-path options?

So I am trying to run an Apache Spark application on AWS EMR in cluster mode using spark-submit. If I have only one jar to provide in the classpath, it works fine with given option using the --jar and --driver-class-path options. All of my required…
CodeHunter
  • 2,017
  • 2
  • 21
  • 47
5
votes
2 answers

Run a spark job: python vs spark.submit

The common way of running a spark job appears to be using spark-submit as below (source): spark-submit --py-files pyfile.py,zipfile.zip main.py --arg1 val1 Being newer to spark, I wanted to know why this first method is preferred over running it…
Mint
  • 1,928
  • 1
  • 13
  • 12
5
votes
2 answers

Spark: How to set spark.yarn.executor.memoryOverhead property in spark-submit

In Spark 2.0. How do you set the spark.yarn.executor.memoryOverhead when you run spark submit. I know for things like spark.executor.cores you can set --executor-cores 2. Is it the same pattern for this property? e.g. --yarn-executor-memoryOverhead…
Micah Pearce
  • 1,805
  • 3
  • 28
  • 61
5
votes
0 answers

pyarrow through spark-submit in cluster mode fails

I have a simple Pyspark code import pyarrow fs = pyarrow.hdfs.connect() If I run this using spark-submit in "client"mode, it works fine, but in "cluster" mode, throws the error Traceback (most recent call last): File "t3.py", line 17, in…
VShankar
  • 151
  • 3
5
votes
0 answers

PySpark dependency modules in spark submit

I'm trying to run spark submit(pyspark) command. As part of the spark submit, I need to provide the dependency of boto3 as it is dependency in my code. I'm running the below command and getting no module error. bin/spark-submit --master=local…
data_addict
  • 816
  • 3
  • 15
  • 32
5
votes
1 answer

AWS EMR Spark Cluster - Steps with Scala fat JAR, can't find MainClass

I have a fat jar, written in Scala, packaged by sbt. I need to use it in a Spark cluster in AWS EMR. It functions fine if I manually spin up the cluster, copy the jar to the master and run a spark-submit job using a command like…
kmh
  • 1,516
  • 17
  • 33
5
votes
2 answers

spark-submit config through file

I am trying to deploy spark job by using spark-submit which has bunch of parameters like spark-submit --class Eventhub --master yarn --deploy-mode cluster --executor-memory 1024m --executor-cores 4 --files app.conf spark-hdfs-assembly-1.0.jar…
roy
  • 6,344
  • 24
  • 92
  • 174
4
votes
0 answers

Stop and Restart SparkContext executing in deploy mode "cluster"

In order to fit the efficiency requirements, I am forced to stop SparkContext and restart it with a new configuration more optimal in terms of number of executors, memory per executor, executor memory overhead... I can achieve this launching my…
4
votes
2 answers

Suppress messages from spark-submit when loading packages

If you try this: spark-submit \ --packages "org.apache.hadoop:hadoop-aws:2.7.4" \ pyspark-example.py You will get a large amount of noise output as spark-submit resolves all the dependencies of the hadoop-aws package and downloads them. You get…
Nick Chammas
  • 11,843
  • 8
  • 56
  • 115
4
votes
1 answer

Spark-submit issue loading classes

I'm using HDP 2.6. I downloaded newest version of Spark (2.2.1) and using spark-submit I'm trying to run my jar (build with same version of Spark as assembly). However, I'm getting error: Class…
4
votes
2 answers

launch Python app with spark-submit in AWS EMR

I'm new to Spark and having trouble replicating the example in the EMR docs for submitting a basic user application with spark-submit via AWS CLI. It seems to run without error but produces no output. Is something wrong with my syntax for add-steps…
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
4
votes
1 answer

How to run SparkR script using spark-submit or sparkR on an EMR cluster?

I have written a sparkR code and wondering if I can submit it using spark-submit or sparkR on an EMR cluster. I have tried several ways for example: sparkR mySparkRScript.r or sparkR --no-save mySparkScript.r etc.. but every time I am getting below…
4
votes
2 answers

In spark code manage conf.setMaster() using a config file to autoset local or yarn-cluster

So while developing spark programs, I use my local machine and hence have to setMaster to "local". However, when I submit the jar built from my locally developed program, I want to obviously not use "local" mode. How can I make use of perhaps…
human
  • 2,250
  • 20
  • 24
4
votes
1 answer

Can we use spark session object without explicitly creating it, if Submit a job by spark-submit

My question is very basic, My code is working fine. But I am not clear with these two points: 1) when we submit any pyspark job using spark-submit do we need to create spark session object like this ? in my script: from pyspark.sql import…
user07
  • 658
  • 3
  • 13
  • 27
4
votes
1 answer

spark Yarn mode how to get applicationId from spark-submit

When I submit spark job using spark-submit with master yarn and deploy-mode cluster, it doesn't print/return any applicationId and once job is completed I have to manually check MapReduce jobHistory or spark HistoryServer to get the job details. My…
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
1 2
3
40 41