Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
0
votes
2 answers

A master URL must be set in your configuration gives lot of confusion

I have compiled my spark-scala code in eclipse. I am trying to run my jar in EMR (5.9.0 Spark 2.2.0)using spark-submit option. But when I run I get an error: Details : Exception in thread "main" org.apache.spark.SparkException: A master URL must be…
Sudarshan kumar
  • 1,503
  • 4
  • 36
  • 83
0
votes
1 answer

Spark Standalone --total-executor-cores

Im using Spark 2.1.1 Standalone cluster, Although I have 29 free cores in my cluster (Cores in use: 80 Total, 51 Used), when submitting new spark job with --total-executor-cores 16 this config is not taking affect and the job submitted only with 6…
0
votes
1 answer

Spark --- zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error

I am trying to load Json in to hbase using the following command spark-shell --master local[*] --jars…
Ram
  • 189
  • 1
  • 4
  • 19
0
votes
1 answer

Trouble Submitting Apache Spark Application to Containerized Cluster

I am having trouble running a Spark application using both spark-submit and the internal REST API. The deployment scenario I would like to demonstrate is Spark running as a cluster on my local laptop. To that end, I've created two Docker containers…
0
votes
1 answer

Apache Beam Spark Runner Issues

We are trying to submit a Spark job through YARN with the following command: spark-submit --conf spark.yarn.stagingDir=/path/to/stage --verbose --class com.my.class --jars /path/to/jar1,path/to/jar2 /path/to/main/jar/application.jar The…
DanOpi
  • 133
  • 2
  • 12
0
votes
0 answers

Spark Standalone cluster only two workers utilized

In Spark Standalone Cluster, only 2 of the 6 worker instances get utilized, rest of them are idle. I used two VMs both having 4 cores. 2 workers were on the local VM(where master was started) and 4 workers were on the other VM. Only local two got…
0
votes
0 answers

Why can't I run spark-shell?

I just downloaded hadoop, spark and hive for a MOOC. I am running Ubuntu17-10 via Virtualmachine. I can run different hadoop commands but when I want to run "spark-shell", I get an error: bin/spark-shell: line 57:…
0
votes
1 answer

How to send ES configurations use spark-submit?

How to send es configurations use spark-submit command like doing in hive? Example: spark-submit ... --files hive-site.xml --jars ... Then I can access hive tables use spark SQL, I want to do similar things using ES, any hints?
no123ff
  • 307
  • 5
  • 16
0
votes
0 answers

How to execute a plain java program using Spark-Submit?

import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import scala.Tuple2; import java.util.Arrays; public class Sample { private static final Logger LOGGER…
0
votes
1 answer

spark-submit fails while not connected to internet

When i try to submit a spark job with spark-submit with using argument --packages defined, i expect spark to search local repo for the artifacts first and use them if they exist. I observe that every time spark tries to fetch the artifacts from…
serkan
  • 555
  • 7
  • 13
0
votes
0 answers

Spark and YARN - how to work together with them

I have a conceptual doubt. It would be about YARN and SPARK, I have a 2 YARN (AM) 28GB with 4 CPUs and a WORKNODE of 56GB with 8 cpus. I submit my applications always through the YARN yarn-cluster in the spark-submit option. How do I use all memory…
0
votes
0 answers

java.lang.ClassNotFoundException: SparkSql at java.net.URLClassLoader.findClass(Unknown Source)

SparkSql.scala import org.apache.spark.sql.SparkSession object SparkSql { def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local[*]").appName("SparkSql1").getOrCreate() //val data =…
0
votes
1 answer

Spark-Application to Local Directory

PROBLEM Spark Application error due to Mkdirs failed to create. I'm using spark 1.6.3 unable to save output on my local dir java.io.IOException: Mkdirs failed to create…
0
votes
1 answer

can spark-submit be used as a job scheduler?

I have a spark standalone cluster with no other job scheduler installed. I wonder if spark-submit can be used as a job scheduler for both spark and non-spark jobs (e.g. a scala jar not written for Spark and not using RDD)? Based on my testing,…
blueskyddd
  • 431
  • 4
  • 12
0
votes
1 answer

CLI argument with spark-submit while executing python file

I am trying to convert the sql server tables to .csv format through below code in pyspark. from pyspark import SparkContext sc = SparkContext("local", "Simple App") from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) df =…
user3521180
  • 1,044
  • 2
  • 20
  • 45