Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
2
votes
0 answers

How to use local application script with spark-submit on mesos/docker?

I'm using spark-submit in "cluster" mode with a Python script against a Spark cluster running on Mesos, and a custom Docker image for the executor set in spark.mesos.executor.docker.image. My script file is already baked into the Docker image (let's…
2
votes
1 answer

SparkPi on kubernetes - Could not find or load main class?

I'm trying to start a standard example SparkPi on a kubernetes cluster. Spark-submitt creates the pod and fails with error - "Error: Could not find or load main class org.apache.spark.examples.SparkPi". spark-submit spark-submit \ --master…
JDev
  • 2,157
  • 3
  • 31
  • 57
2
votes
1 answer

Spark-submit cluster mode - NoClassDefFoundError

i am getting below error while submitting spark job in cluster mode. –deploy-mode client model is working fine /usr/spark2.0.2/bin/ spark-submit --name hello --master yarn --deploy-mode client --executor-memory 1g --executor-cores 1…
2
votes
2 answers

Spark java.lang.OutOfMemoryError : Java Heap space

I am geting the above error when i run a model training pipeline with spark `val inputData = spark.read .option("header", true) .option("mode","DROPMALFORMED") .csv(input) .repartition(500) .toDF("b", "c") .withColumn("b",…
user3245722
  • 323
  • 2
  • 3
  • 9
2
votes
1 answer

No such file or directory in spark cluster mode

I am writing a spark-streaming application using pyspark which basically process the data. Inshort packaging overview: This application contains several modules and some config files which are non .py files (ex:.yaml or .json). I am packaging this…
2
votes
2 answers

Hdfs file access in spark

I am developing an application , where I read a file from hadoop, process and store the data back to hadoop. I am confused what should be the proper hdfs file path format. When reading a hdfs file from spark shell like val…
Girish Bhat M
  • 392
  • 3
  • 13
2
votes
1 answer

Is there any possibility of spark.scheduler.pool (local property ) set from spark-submit

I am using a fair scheduler along with yarn spark.scheduler.pool is a local property to set to spark context while executing jobs, for a configured pool. like.. val sc: SparkContext = ??? sc.setLocalProperty("spark.scheduler.pool", "myPool") I was…
user3190018
  • 890
  • 13
  • 26
2
votes
2 answers

Spark job not using the worker nodes on the cluster

I have set up spark on a cluster of 3 nodes, one is my namenode-master (named h1) and other two are my datanode-workers (named h2 and h3). When I give the command to run a spark job on my master, it seems like the job is not getting distributed to…
learning_dev
  • 115
  • 2
  • 13
2
votes
0 answers

spark-submit class not found in spark 2.2.0

Spark Submit command: /opt/cmsgraph/spark/default/bin/spark-submit -v \ --driver-java-options -Djava.io.tmpdir=/opt/cmsgraph/temp --conf spark.cassandra.connection.timeout_ms=60000 \ --conf spark.cassandra.input.fetch.size_in_rows=1 \ --conf…
2
votes
1 answer

Is it true that with mesos I can start only one executor per node in spark-submit?

I would like to know if it is true that on mesos we can have only one executor per node? Context I am running a spark-submit (Spark 2.0.1) job on a cluster of 5 nodes (workers) each with 80 CPU and 512 GB memory in coarse-grained mode. Official…
astro_asz
  • 2,278
  • 3
  • 15
  • 31
2
votes
1 answer

Can't download packages in spark-shell from password protected private nexus

I am trying to use a private password protected nexus to download and add private jar artifacts to spark-shell classpath but it fails to download it. The spark documentation (https://spark.apache.org/docs/latest/submitting-applications.html)…
2
votes
1 answer

Spark -- Loading log4j from JAR running spark-submit

I have developed a custom log4j to my spark application: ####################### # Roll by time # ####################### log4j.logger.myLogger=DEBUG, file…
Borja
  • 194
  • 1
  • 3
  • 17
2
votes
1 answer

Issues in EMR add step to submit a spark job

I tried to submit a spark job with 2 dependent jar packages A.jar and B.jar on EMR with below command aws emr add-steps --cluster-id j-1WM5F79YY6EIN --steps Type=Spark,Name="test",…
Jeff
  • 267
  • 5
  • 20
2
votes
1 answer

spark-submit pipeline model

I have an Apache Spark cluster (1 master + 1 worker) running on docker, I'm able to submit a job using spark-submit that fits a pipeline and then it is saved (PipelineModel.save(path)). The file is saved on my local machine exactly in the point…
2
votes
2 answers

Spark-submit create only 1 executor when pyspark interactive shell create 4 (both using yarn-client)

I'm using the quickstart cloudera VM (CDH 5.10.1) with Pyspark (1.6.0) and Yarn (MR2 Included) to aggregate numerical data per hour. I've got 1 CPU with 4 cores and 32 Go of RAM. I've got a file named aggregate.py but until today I never submitted…