Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
1
vote
0 answers

How to pass special characeters while submitting a job in data proc cluster

i am trying to pass a url as key value pair in data proc submit job, how ever this url has "=" in it , so its not considering the later part after "=" how do i pass whole url as key value…
1
vote
1 answer

Apache Hudi deltastreamer throwing Exception in thread "main" org.apache.hudi.com.beust.jcommander.ParameterException' no main parameter was defined

Version Apache Hudi 0.6.1,Spark 2.4.6 Below is the standard spark-submit command for Hudi deltastreamer, where it is throwing as no main parameter is defined. I could see all the properties parameters are given. Appreciate any help on this…
Nizam
  • 77
  • 2
  • 11
1
vote
0 answers

How to Completely Stop or Shutdown Spark Java Application

I have a spark application written in Java which runs everyday. I have a requirement to exit the program(stop the spark application) based on a condition. I am currently using System.exit(0) for shutting down the application. But i see the…
user1326784
  • 627
  • 3
  • 11
  • 31
1
vote
1 answer

Spark Submit command is returning a missing application resource

To start things off I created a jar file using this How to build jars from IntelliJ properly?. My Jar files path is out/artifacts/sparkProgram_jar/sparkProgram.jar My spark program, in general, reads a table from MongoDB, transforms it using…
Sajeed
  • 119
  • 10
1
vote
0 answers

Failed to load class in Spark-submit

I'm running a jar file with Spark-submit, but i keep getting this error: Error: Failed to load class antarctic.DataQuality. This is the command: spark-submit --class antarctic.DataQuality --master local[*] --deploy-mode client --jars…
1
vote
2 answers

SparkSubmitOperator vs SSHOperator for submitting pyspark applications in airflow

I have spark and airflow servers differently. And I don't have spark binary in airflow servers. I am able to use SSHOperator and run the spark jobs in cluster mode perfectly well. I would like to know what would be good using either SSHOperator or…
kavya
  • 75
  • 1
  • 10
1
vote
1 answer

Submitting a pyspark job to Amazon EMR cluster from terminal

I have SSH-ed into the Amazon EMR server and I want to submit a Spark job ( a simple word count file and a sample.txt are both on the Amazon EMR server ) written in Python from the terminal. How do I do this and what's the syntax? The word_count.py…
ouila
  • 45
  • 1
  • 9
1
vote
2 answers

Spark-submit configuration: jars,packages

Anyone can tell me how to use jars and packages . I'm working on web aplication. For Engine side spark-mongo bin/spark-submit --properties-file config.properties --packages …
vishal
  • 25
  • 8
1
vote
2 answers

Is that possible to run "spark-submit" in databricks without creating jobs ? if yes ! What is the possiblities,

I am trying to execute spark-submit in databricks workspace notebook without creating jobs, Help me!
Thiru Balaji G
  • 163
  • 2
  • 10
1
vote
1 answer

FileNotFoundError: No such file or directory for spark-submit encountered when running pyspark commands on Heroku

Background: I built an XGBClassifier model for content-based filtering and an ALS model for collaborative filtering (for ALS, I imported from pyspark.ml) and took the weighted sum of rating predictions from both to yield the final rating…
1
vote
0 answers

Spark Job RPC Issue

My spark job succeeds but it takes longer time than usual. Getting the following message repeatedly in the driver node when the job gets accepted as well as when it's running - 20/05/01 01:17:05 INFO resource.u: Set a new configuration for the…
user2597100
  • 97
  • 1
  • 10
1
vote
1 answer

Pyspark: Container exited with a non-zero exit code 143

I have seen various threads on this issue but the solutions given are not working in my case. The environment is with pyspark 2.1.0 , Java 7 and has enough memory and Cores. I am running a spark-submit job which deals with Json files, the job runs…
Mahesh
  • 75
  • 2
  • 9
1
vote
0 answers

java.net.SocketTimeoutException when scanning a HBase table with Regex filter

Due to the rowkey design I need perform a regex scan filter, which for my understanding scan the entire set of rowkeys of that table. The problem I am facing is by default that limit is by default callTimeout=60000 and I am going beyond that…
Ignacio Alorre
  • 7,307
  • 8
  • 57
  • 94
1
vote
1 answer

How to submit a SPARK job of which the jar is hosted in S3 object store

I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster…
Danny
  • 31
  • 1
  • 3
1
vote
1 answer

How can I transform top level dependencies from mvn dependency:tree into a list of Maven coordinates using bash?

To enable creating a spark submit command for my applications without creating uber-jars, I want to create a comma separated list of maven coordinates of the applications top level dependencies during my build process, which I can then use in…
Danny Varod
  • 17,324
  • 5
  • 69
  • 111