Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
1
vote
0 answers

Spark-submit issue - looking for non-existent path

I am trying to run spark-submit: /usr/local/Cellar/apache-spark/2.3.0/libexec/bin/spark-submit sdp-consumer.py It gives error: /usr/local/Cellar/apache-spark/2.3.0/libexec/bin/spark-submit: line 27:…
Joe
  • 11,983
  • 31
  • 109
  • 183
1
vote
0 answers

cannot pickle pyspark dataframe

I want to create a decision tree model using spark submit. from pyspark.mllib.regression import LabeledPoint from pyspark.mllib.tree import DecisionTree from pyspark import SparkConf, SparkContext from numpy import array from pyspark.sql import…
betty bth
  • 33
  • 7
1
vote
1 answer

How to set spark.driver.extraClassPath through Apache Livy on Azure Spark cluster?

I would would like to add some configuration when a Spark Job is submitted via Apache Livy into an Azure cluster. Currently to launch a spark Job via Apache Livy in the cluster, I use the following command curl -X POST --data '{"file":…
moun
  • 69
  • 1
  • 6
1
vote
0 answers

PySpark fails with exit code 52

I have an Amazon EMR cluster running, to which I submit jobs using the spark-submit shell command. The way I call it: spark-submit --master yarn --driver-memory 10g convert.py The convert.py script is running using PySpark with Python 3.4. After…
1
vote
1 answer

Submit Python Script into Spark Cluster

Im trying to submit the following python script into Spark Cluster. I have 2 slaves running from sklearn import grid_search, datasets from sklearn.ensemble import RandomForestClassifier # Use spark_sklearn’s grid search instead: from…
syv
  • 3,528
  • 7
  • 35
  • 50
1
vote
0 answers

Create fat runnable jar for spark-submit

I'm writing a Spark application using Java(not Scala). Something like: SparkConf conf = new SparkConf().setAppName("TEST"); JavaSparkContext sc = new JavaSparkContext(conf); sc.setLogLevel("WARN"); I started with simple java project and when to…
DXC
  • 21
  • 1
1
vote
1 answer

PySpark failing in Jupyter after setting PYSPARK_SUBMIT_ARGS

I'm trying to load a Spark (2.2.1) package in a Jupyter notebook that can otherwise run Spark fine. Once I add %env PYSPARK_SUBMIT_ARGS='--packages com.databricks:spark-redshift_2.10:2.0.1 pyspark-shell' I get this error upon trying to create a…
lfk
  • 2,423
  • 6
  • 29
  • 46
1
vote
1 answer

Spark standalone connection driver to worker

I'm trying to host locally a spark standalone cluster. I have two heterogeneous machines connected on a LAN. Each piece of the architecture listed below is running on docker. I have the following configuration master on machine 1 (port 7077…
Matthias Beaupère
  • 1,731
  • 2
  • 17
  • 44
1
vote
1 answer

Spark Application Not reading log4j.properties present in Jar

I am using MapR5.2 - Spark version 2.1.0 And i am running my spark app jar in Yarn CLuster mode. I have tried all the available options that i found But unable to succeed. This is our Production environment. But i need that for my particular spark…
AJm
  • 993
  • 2
  • 20
  • 39
1
vote
0 answers

'Cannot allocate memory' when submit Spark job

I got an error when try to submit a Spark Job to yarn. But I can't understand which JVM throwed this error. How can I avoid this error? Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006bff80000, 3579314176, 0) failed;…
lulijun
  • 415
  • 3
  • 22
1
vote
1 answer

spark-submit to a docker container

I created a Spark Cluster using this repository and the relative documentation. Now I'm trying to execute through spark-submit a job inside the Docker container of the Spark Master so the command that I use is something…
Vzzarr
  • 4,600
  • 2
  • 43
  • 80
1
vote
1 answer

Spark submitted application not shown in YARN web ui

I have node where I have installed spark in yarn mode. When I run an application with sudo ./usr/bin/spark-submit --master yarn --deploy-mode client MySparkCode.py it runs fine. When I connect in spark history server at http://localhost:18089/ I…
Michail N
  • 3,647
  • 2
  • 32
  • 51
1
vote
0 answers

Spark fails on task with SparkException: Can only zip RDDs with same... with no zip() direct call

I'm using Spark with a spark-submit call and a python script for cluster analysis I send with it. From the script: spark = SparkSession.builder.appName(results.taskName).getOrCreate() dataset =…
Ran P
  • 332
  • 2
  • 4
  • 11
1
vote
0 answers

Copy files (config) from HDFS to local working directory of every spark executor

I am looking how to copy a folder with files of resource dependencies from HDFS to a local working directory of each spark executor using Java. I was at first thinking of using --files FILES option of spark-submit but it seems it does not support…
YuGagarin
  • 341
  • 7
  • 20
1
vote
0 answers

Trying to run a spark-submit job on a yarn cluster but I keep getting the following warning. How do I fix the issue?

WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources. I have looked through similar questions and tried everything else that was…
Sonia S
  • 15
  • 5