Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
3
votes
1 answer

Spark-Submit : Cannot run with virtualenv

I have a python app that i want to run via a viratual environment using spark submit. Here is my command PYSPARK_PYTHON=./venv/bin/python spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./venv/bin/python --master yarn --deploy-mode…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
3
votes
1 answer

Dataproc arguments not being read on spark submit

I am using dataproc to submit jobs on spark. However on spark-submit, non-spark arguments are being read as spark arguments! I am receiving the error/warning below when running a particular job. Warning: Ignoring non-spark config property:…
3
votes
0 answers

How do you make spark-submit arguments come right after "spark-submit" when debugging in vs code insiders

I am debugging a spark job (pyspark) in vs code in a remote ssh session (vs code insiders) Using vs code inisders 1.35.0 with remote ssh. This is my launch.json { "version": "0.2.0", "configurations": [ { "name": "Python:…
okyere
  • 171
  • 1
  • 3
  • 16
3
votes
1 answer

Hbase doesn't work well with spark-submit

I have an app that does some work and at the end it needs to read some file from hdfs and store it into hbase. The app runs when using master local with no issue using apache spark, but when I run it using spark-submit it doesn't work anymore,I get…
Salvatore Nedia
  • 302
  • 2
  • 15
3
votes
1 answer

Difference between running spark application as standalone vs spark submit / spark launcher?

I am exploring different options to package spark application and i am confused what is the best mode and what are the differences between the following modes? Submit spark application's jar to spark-submit Construct a fat jar out of spark gradle…
Mozhi
  • 757
  • 1
  • 11
  • 28
3
votes
0 answers

Warning: Skip remote jar hdfs

I would like to submit a spark job with configuring additional jar on hdfs, however the hadoop gives me a warning on skipping remote jar. Although I can still get my final results on hdfs, I cannot obtain the effect of additional remote jar. I would…
Neo
  • 31
  • 4
3
votes
1 answer

Spark-Submit Error: Cannot load main class from JAR file

I am trying to spark-submit an application in Scala cluster mode.It was working fine in PySpark but while trying to run with Scala the above error is popping up. If I have to add SBT and Maven dependencies can you elaborate the procedure(I am not…
Fasty
  • 784
  • 1
  • 11
  • 34
3
votes
1 answer

Reducing Apache Spark Startup Time

I am running a standalone Spark cluster and submitting my applications (written in SparkR) using spark-submit in client mode. I have a set of applications that I have to run according to the user's input, so I can't keep them running. Each time, to…
Piyush Shrivastava
  • 1,046
  • 2
  • 16
  • 43
3
votes
2 answers

Issues with Scala ScriptEngine inside spark submit application

I am working on a system where I let users write DSLS and I load it as instances of my Type during runtime and these can be applied on top of RDDs. The entire application runs as a spark-submit application and I use ScriptEngine engine to compile…
3
votes
2 answers

List of spark-submit options

There are a ton of tunable settings mentioned on Spark configurations page. However as told here, the SparkSubmitOptionParser attribute-name for a Spark property can be different from that property's-name. For instance, spark.executor.cores is…
y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
3
votes
2 answers

spark-submit with specific python librairies

I have a pyspark code depending on third party librairies. I want to execute this code on my cluster which run under mesos. I do have a zipped version of my python environment that is on a http server reachable by my cluster. I have some trouble to…
3
votes
1 answer

Set PySpark Serializer in PySpark Builder

I am using PySpark 2.1.1 and am trying to set the serializer when using Spark Submit. In my application, I initialize the SparkSession.builder as follows print("creating spark session") spark =…
Max
  • 837
  • 4
  • 11
  • 20
3
votes
1 answer

GCP Dataproc spark.jar.packages issue downloading dependencies

When creating our Dataproc Spark cluster we pass --properties spark:spark.jars.packages=mysql:mysql-connector-java:6.0.6 to the gcloud dataproc clusters create command. This is for our PySpark scripts to save to CloudSQL Apparently on creation this…
Tom Lous
  • 2,819
  • 2
  • 25
  • 46
3
votes
1 answer

How to setup google cloud storage correctly for spark application using aws data pipeline

I am setting up the cluster step to run a spark application using Amazon Data Pipeline. My job is to read data from S3, process the data and write data to google cloud storage. For google cloud storage, I am using the service account with key file.…
3
votes
1 answer

Zeppelin notebook execute not manual

is there a way to execute the spark code in a zeppelin notebook, without having to do it interactively? I'm looking for something specific or if anyone could point me in the correct direction. Or alternatively, other ways to submit spark code, which…