Highest Voted 'spark-submit' Questions

2

votes

0 answers

How to use local application script with spark-submit on mesos/docker?

I'm using spark-submit in "cluster" mode with a Python script against a Spark cluster running on Mesos, and a custom Docker image for the executor set in spark.mesos.executor.docker.image. My script file is already baked into the Docker image (let's…

asked Jul 31 '18 at 14:53

Johannes Petzold

418
3
8

2

votes

1 answer

SparkPi on kubernetes - Could not find or load main class?

I'm trying to start a standard example SparkPi on a kubernetes cluster. Spark-submitt creates the pod and fails with error - "Error: Could not find or load main class org.apache.spark.examples.SparkPi". spark-submit spark-submit \ --master…

apache-spark kubernetes spark-submit

asked Jul 22 '18 at 16:28

JDev

2,157
3
31
57

2

votes

1 answer

Spark-submit cluster mode - NoClassDefFoundError

i am getting below error while submitting spark job in cluster mode. –deploy-mode client model is working fine /usr/spark2.0.2/bin/ spark-submit --name hello --master yarn --deploy-mode client --executor-memory 1g --executor-cores 1…

apache-spark spark-submit

asked Jul 08 '18 at 14:42

user2381482

29
3

2

votes

2 answers

Spark java.lang.OutOfMemoryError : Java Heap space

I am geting the above error when i run a model training pipeline with spark `val inputData = spark.read .option("header", true) .option("mode","DROPMALFORMED") .csv(input) .repartition(500) .toDF("b", "c") .withColumn("b",…

apache-spark out-of-memory spark-submit

asked Jun 28 '18 at 11:44

user3245722

323
2
3
9

2

votes

1 answer

No such file or directory in spark cluster mode

I am writing a spark-streaming application using pyspark which basically process the data. Inshort packaging overview: This application contains several modules and some config files which are non .py files (ex:.yaml or .json). I am packaging this…

apache-spark pyspark spark-streaming hadoop-yarn spark-submit

asked Jun 03 '18 at 15:39

Rakesh SKadam

378
1
2
18

2

votes

2 answers

Hdfs file access in spark

I am developing an application , where I read a file from hadoop, process and store the data back to hadoop. I am confused what should be the proper hdfs file path format. When reading a hdfs file from spark shell like val…

apache-spark hadoop spark-submit

asked May 04 '18 at 10:23

Girish Bhat M

392
3
13

2

votes

1 answer

Is there any possibility of spark.scheduler.pool (local property ) set from spark-submit

I am using a fair scheduler along with yarn spark.scheduler.pool is a local property to set to spark context while executing jobs, for a configured pool. like.. val sc: SparkContext = ??? sc.setLocalProperty("spark.scheduler.pool", "myPool") I was…

scala apache-spark hadoop-yarn spark-submit

asked May 03 '18 at 00:56

user3190018

890
13
26

2

votes

2 answers

Spark job not using the worker nodes on the cluster

I have set up spark on a cluster of 3 nodes, one is my namenode-master (named h1) and other two are my datanode-workers (named h2 and h3). When I give the command to run a spark job on my master, it seems like the job is not getting distributed to…

apache-spark spark-submit

asked Apr 18 '18 at 14:57

learning_dev

115
2
13

2

votes

0 answers

spark-submit class not found in spark 2.2.0

Spark Submit command: /opt/cmsgraph/spark/default/bin/spark-submit -v \ --driver-java-options -Djava.io.tmpdir=/opt/cmsgraph/temp --conf spark.cassandra.connection.timeout_ms=60000 \ --conf spark.cassandra.input.fetch.size_in_rows=1 \ --conf…

java scala apache-spark spark-submit

asked Apr 04 '18 at 07:46

Himanshu Gangwar

21
2

2

votes

1 answer

Is it true that with mesos I can start only one executor per node in spark-submit?

I would like to know if it is true that on mesos we can have only one executor per node? Context I am running a spark-submit (Spark 2.0.1) job on a cluster of 5 nodes (workers) each with 80 CPU and 512 GB memory in coarse-grained mode. Official…

apache-spark mesos spark-submit

asked Jan 31 '18 at 16:12

astro_asz

2,278
3
15
31

2

votes

1 answer

Can't download packages in spark-shell from password protected private nexus

I am trying to use a private password protected nexus to download and add private jar artifacts to spark-shell classpath but it fails to download it. The spark documentation (https://spark.apache.org/docs/latest/submitting-applications.html)…

apache-spark dependency-management spark-submit

asked Dec 15 '17 at 05:09

Danish Shrestha

487
5
16

2

votes

1 answer

Spark -- Loading log4j from JAR running spark-submit

I have developed a custom log4j to my spark application: ####################### # Roll by time # ####################### log4j.logger.myLogger=DEBUG, file…

apache-spark log4j spark-submit

asked Nov 28 '17 at 16:55

Borja

194
1
3
17

2

votes

1 answer

Issues in EMR add step to submit a spark job

I tried to submit a spark job with 2 dependent jar packages A.jar and B.jar on EMR with below command aws emr add-steps --cluster-id j-1WM5F79YY6EIN --steps Type=Spark,Name="test",…

amazon-web-services jar emr spark-submit

asked Nov 21 '17 at 06:23

Jeff

267
5
20

2

votes

1 answer

spark-submit pipeline model

I have an Apache Spark cluster (1 master + 1 worker) running on docker, I'm able to submit a job using spark-submit that fits a pipeline and then it is saved (PipelineModel.save(path)). The file is saved on my local machine exactly in the point…

apache-spark docker apache-spark-ml spark-submit

asked Nov 16 '17 at 09:01

Marco Cinus

21
2

2

votes

2 answers

Spark-submit create only 1 executor when pyspark interactive shell create 4 (both using yarn-client)

I'm using the quickstart cloudera VM (CDH 5.10.1) with Pyspark (1.6.0) and Yarn (MR2 Included) to aggregate numerical data per hour. I've got 1 CPU with 4 cores and 32 Go of RAM. I've got a file named aggregate.py but until today I never submitted…

apache-spark pyspark hadoop-yarn cloudera-quickstart-vm spark-submit

asked Oct 26 '17 at 09:42

bobolafrite

100
1
11

Questions tagged [spark-submit]