Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
1
vote
1 answer

Is there a way to change the output format of spark-submit

I'm running a python script from spark-submit, the stdout from the script is output by spark-submit like this: [dd-MM-yyyy HH:MM] Line1 [dd-MM-yyyy HH:MM] Line2 [dd-MM-yyyy HH:MM] Line3 Is there anyway to get it to output like…
Andy
  • 3,228
  • 8
  • 40
  • 65
1
vote
0 answers

Error in packaging and deploying the pyspark application to cluster via spark-submit

I have code structure like below:- my_app | |--- common | | | |---init.py | |---spark | |--init.py | | |--- subproject1 | | | |-- init.py | |-- main.py | |--job |…
dks551
  • 1,113
  • 1
  • 15
  • 39
1
vote
2 answers

Not able to call "spark-submit" from within scala via system call apparently due to "--jars" parameter (having *wildcard) not being expanded

Following "spark-submit" call works fine in shell /bin/bash -c '/local/spark-2.3.1-bin-hadoop2.7/bin/spark-submit --class analytics.tiger.agents.spark.Orsp --master spark://analytics.broadinstitute.org:7077 --deploy-mode client --executor-memory…
Nasko
  • 21
  • 3
1
vote
1 answer

How to ignore spark-submit warnings for pyspark

When I submit my python file to spark like this spark-submit driver.py It starts showing a lot of warning related to python 2 print method. 18/10/19 01:37:52 WARN ScriptBasedMapping: Exception running /etc/hadoop/conf/topology_script.py…
Avinash
  • 2,093
  • 4
  • 28
  • 41
1
vote
1 answer

Kafka Stream to Spark Stream python

We have Kafka stream which use Avro. I need to connect it to Spark Stream. I use bellow code as Lev G suggest. kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers}, valueDecoder=MessageSerializer.decode_message) I…
1
vote
0 answers

Spark-submit cannot access hadoop file system in EMR?

I am trying to submit job to yarn on other cluster using marathon by use of a docker container, The docker container is installed with hadoop and spark binaries and has correct path to hadoop_conf_dir and yarn_corn_dir. However when i try to do…
user_01_02
  • 711
  • 2
  • 15
  • 31
1
vote
0 answers

Submitting sparkr job from rest api

The Spark hidden REST API (https://gist.github.com/arturmkrtchyan/5d8559b2911ac951d34a) has been proven useful to me for submitting Scala jobs. But is there any way to submit SparkR jobs through this API? I tried it but got this error: Exception in…
Piyush Shrivastava
  • 1,046
  • 2
  • 16
  • 43
1
vote
3 answers

Get the Exit status for failed Spark jobs when submitted through Spark-submit

I am submitting spark jobs using spark-submit in standalone mode. All these jobs are triggered using cron. I want to monitor these jobs for any failure. But using spark-submit if any exception occurs in the application (Ex. ConnectionException) the…
1
vote
1 answer

spark read contents of zip file in HDFS

I Am trying to read data from zip file can read whole text file as below val f = sc.wholeTextFiles("hdfs://") but don`t know, how to read text data inside zip file Is there any possible way to do it, if yes please let me know.
sande
  • 567
  • 1
  • 10
  • 24
1
vote
0 answers

Spark-submit command options --num-executors issue

I have following spark configuration : 1 Master and 2 Workers Each worker has 88 Cores , hence total no. of cores 176 Each worker has 502 GB memory , so total memory available is 1004 GB now I want to run 40 executors so that all the cores will…
Raj
  • 707
  • 6
  • 23
1
vote
4 answers

spark elasticsearch: Multiple ES-Hadoop versions detected in the classpath

I'm new to spark. I'm trying to run a spark job that loads data to elasticsearch. I've built a fat jar from my code and used it during spark-submit. spark-submit \ --class CLASS_NAME \ --master yarn \ --deploy-mode cluster \ --num-executors…
pkgajulapalli
  • 1,066
  • 3
  • 20
  • 44
1
vote
0 answers

Nifi Job to execute a spark submit command not giving correct results

I have a spark code that appends data from a hive table to parquet files partitioned on dates. The code runs absolutely correct when executed from the spark shell and the parquet files show the exact same number of rows as present in the hive table…
1
vote
1 answer

spark-submit with Mahout error on cluster mode (Scala/java)

I'm trying to build a basic recomender with Spark and Mahout on Scala. I use the follow mahout repo to compile mahout with scala 2.11 and spark 2.1.2 mahout_fork To execute my code I use spark-submit and it run fine when I put --master local but…
1
vote
1 answer

Where to set "spark.yarn.executor.memoryOverhead"

I am getting following error while running my spark-scala program. YarnSchedulerBackends$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 2.6GB of 2.5GB physical memory used. Consider boosting…
Don Sam
  • 525
  • 5
  • 20
1
vote
0 answers

Spark job creating only 1 stage task when executed

I am trying to load data from DB2 to Hive using Spark 2.1.1. & Scala 2.11. Code used is given below import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql import org.apache.spark.sql.SparkSession import…