Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
2
votes
1 answer

Spark Error - cannot assign instance of SerializedLambda to field javaRDDLike of type FlatMapFunction

I’ve been trying for a while to launch a simple spring-spark app on the cluster, but I found the following problem: Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field…
Jecushunk
  • 31
  • 5
2
votes
1 answer

How to override spark jars while running spark-submit command in cluster mode? (okhttp3)

There is a conflict of jar in my project and jar in spark-2.4.0 jars folder. My Retrofit brings okhttp-3.13.1.jar (verified in mvn dependency:tree), but spark in server has okhttp-3.8.1.jar, and I get NoSuchMethodException. So, I'm trying to give my…
Saawan
  • 363
  • 6
  • 24
2
votes
0 answers

org.apache.hadoop.net.ConnectTimeoutException: Call From local_computer to hadoop_server failed on socket timeout exception

We're trying to execute a local spark submission to a YARN that exists in another server: The idea here is we're trying to externalize the submission from a jupyterlab that exists in a cloud container, into a jupyterlb in local environment; we…
Idhem
  • 880
  • 1
  • 9
  • 22
2
votes
1 answer

spark-submit cannot use dependency in my jar file

I made jar file which has org.apache.spark.sql.kafka10 inside. ( Using spark-sql-kafka-0-10_2.11:2.4.3). But when I execute with ./bin/spark-submit --class MYCLASS --master local[*] MYJAR.jar, there is error like below. Exception in thread "main"…
Kimjungwow
  • 65
  • 1
  • 7
2
votes
0 answers

SparkR: source() other R files in an R Script when running spark-submit

I'm new to Spark and newer to R, and am trying to figure out how to 'include' other R-scripts when running spark-submit. Say I have the following R script which "sources" another R script: main.R source("sub/fun.R") mult(4, 2) The second R script…
Joe J
  • 9,985
  • 16
  • 68
  • 100
2
votes
0 answers

The exit code of spark-submit is still 0 in unix script when an yarn application fails

I am submitting my spark job on yark using unix scripts and spark-submit commands. I am checking the status of spark job in if else block of unix and throwing error, however i observed that if spark job failed or incomplete on yarn still it shows…
user10437665
  • 95
  • 2
  • 9
2
votes
1 answer

Remove JAR from Spark default classpath in EMR

I'm executing a spark-submit script in an EMR step that has my super JAR as the main class, like spark-submit \ .... --class ${MY_CLASS} "${SUPER_JAR_S3_PATH}" ... etc but Spark is by default loading the jar…
user3613290
  • 461
  • 6
  • 18
2
votes
1 answer

Popen: redirect stderr and stdout to single stream

I have created a Wrapper around the Spark-Submit command to be able to generate real time events by parsing the logs. The purpose is to create a Real Time interface showing detailed progress of a Spark Job. So the wrapper will look like this: …
Codious-JR
  • 1,658
  • 3
  • 26
  • 48
2
votes
0 answers

Spark submit on Yarn failing with: LeaseExpiredException No lease on /user/ck/.sparkStaging does not have open files

I'm launching a spark app on Yarn using spark-submit. It is failing with leaseExpiredException with the below stack trace for the keytab used to launch spark app. Cluster is Kerberos and Wandisco enabled. Any ideas on what could've caused this?…
chanakya
  • 21
  • 3
2
votes
1 answer

how to pass json format as one argument in spark submit?

spark-submit --class com.HelloWorld \ --master yarn --deploy-mode client \ --executor-memory 5g /home/Hadoop-Work/HelloWorld.jar \ "/home/Hadoop-Work/application.properties"…
2
votes
2 answers

Difference in running a spark application with sbt run or with spark-submit script

I am new to Spark and as I am learning this framework, I figured out that, to the best of my knowledge, there are two ways for running a spark application when written in Scala: Package the project into a JAR file, and then run it with the…
YACINE GACI
  • 145
  • 2
  • 13
2
votes
1 answer

Call R notebooks on Databricks from second R notebook

I try to call a R notebook on Databricks while passing parameters using spark-submit. My approach looks like this: com <- "spark-submit foo.R p1 & spark-submit foo.R p2" system(com) This should call the script foo.Rand hand over the parameter…
CKre
  • 181
  • 2
  • 13
2
votes
2 answers

How can I execute a S3-dist-cp command within a spark-submit application

I have a jar file that is being provided to spark-submit.With in the method in a jar. I’m trying to do a Import sys.process._ s3-dist-cp —src hdfs:///tasks/ —dest s3:// I also installed s3-dist-cp on all salves along with…
Ram
  • 159
  • 1
  • 10
2
votes
1 answer

How to reference .so files in spark-submit command

I am using TimesTen Database with Spark 2.3.0 I need to refer to .so files in spark-submit command in order to connect to Timesten db. Is there any option for same in spark-submit? I tried adding so file in --conf spark.executor.extraLibraryPath…
Curious Techie
  • 185
  • 2
  • 15
2
votes
2 answers

PySpark spark-submit command with --files argument Error

I am running a PySpark job in Spark 2.3 cluster with the following command. spark-submit --deploy-mode cluster --master yarn --files ETLConfig.json PySpark_ETL_Job_v0.2.py ETLConfig.json has a parameter passed to the PySpark script. I am…
AngiSen
  • 915
  • 4
  • 18
  • 41