Highest Voted 'spark-submit' Questions

1

vote

1 answer

Is there a way to change the output format of spark-submit

I'm running a python script from spark-submit, the stdout from the script is output by spark-submit like this: [dd-MM-yyyy HH:MM] Line1 [dd-MM-yyyy HH:MM] Line2 [dd-MM-yyyy HH:MM] Line3 Is there anyway to get it to output like…

scala apache-spark spark-submit

asked Nov 06 '18 at 14:20

Andy

3,228
8
40
65

1

vote

0 answers

Error in packaging and deploying the pyspark application to cluster via spark-submit

pyspark spark-submit

asked Nov 02 '18 at 15:41

dks551

1,113
1
15
39

1

vote

2 answers

Not able to call "spark-submit" from within scala via system call apparently due to "--jars" parameter (having *wildcard) not being expanded

Following "spark-submit" call works fine in shell /bin/bash -c '/local/spark-2.3.1-bin-hadoop2.7/bin/spark-submit --class analytics.tiger.agents.spark.Orsp --master spark://analytics.broadinstitute.org:7077 --deploy-mode client --executor-memory…

linux scala shell apache-spark spark-submit

asked Oct 30 '18 at 21:13

Nasko

21
3

1

vote

1 answer

How to ignore spark-submit warnings for pyspark

When I submit my python file to spark like this spark-submit driver.py It starts showing a lot of warning related to python 2 print method. 18/10/19 01:37:52 WARN ScriptBasedMapping: Exception running /etc/hadoop/conf/topology_script.py…

python apache-spark pyspark warnings spark-submit

asked Oct 19 '18 at 05:42

Avinash

2,093
4
28
41

1

vote

1 answer

Kafka Stream to Spark Stream python

We have Kafka stream which use Avro. I need to connect it to Spark Stream. I use bellow code as Lev G suggest. kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers}, valueDecoder=MessageSerializer.decode_message) I…

apache-spark pyspark spark-streaming spark-submit stream-processing

asked Oct 09 '18 at 06:15

GihanDB

591
2
6
23

1

vote

0 answers

Spark-submit cannot access hadoop file system in EMR?

I am trying to submit job to yarn on other cluster using marathon by use of a docker container, The docker container is installed with hadoop and spark binaries and has correct path to hadoop_conf_dir and yarn_corn_dir. However when i try to do…

apache-spark hadoop hadoop-yarn amazon-emr spark-submit

asked Sep 11 '18 at 22:08

user_01_02

711
2
15
31

1

vote

0 answers

Submitting sparkr job from rest api

The Spark hidden REST API (https://gist.github.com/arturmkrtchyan/5d8559b2911ac951d34a) has been proven useful to me for submitting Scala jobs. But is there any way to submit SparkR jobs through this API? I tried it but got this error: Exception in…

apache-spark sparkr spark-submit

asked Aug 30 '18 at 10:11

Piyush Shrivastava

1,046
2
16
43

1

vote

3 answers

Get the Exit status for failed Spark jobs when submitted through Spark-submit

I am submitting spark jobs using spark-submit in standalone mode. All these jobs are triggered using cron. I want to monitor these jobs for any failure. But using spark-submit if any exception occurs in the application (Ex. ConnectionException) the…

apache-spark spark-submit apache-spark-standalone

asked Aug 29 '18 at 07:21

thebytewalker

316
4
15

1

vote

1 answer

spark read contents of zip file in HDFS

I Am trying to read data from zip file can read whole text file as below val f = sc.wholeTextFiles("hdfs://") but don`t know, how to read text data inside zip file Is there any possible way to do it, if yes please let me know.

scala apache-spark spark-submit spark-shell

asked Aug 23 '18 at 21:10

sande

567
1
10
24

1

vote

0 answers

Spark-submit command options --num-executors issue

I have following spark configuration : 1 Master and 2 Workers Each worker has 88 Cores , hence total no. of cores 176 Each worker has 502 GB memory , so total memory available is 1004 GB now I want to run 40 executors so that all the cores will…

apache-spark spark-submit

asked Aug 23 '18 at 09:46

Raj

707
6
23

1

vote

4 answers

spark elasticsearch: Multiple ES-Hadoop versions detected in the classpath

I'm new to spark. I'm trying to run a spark job that loads data to elasticsearch. I've built a fat jar from my code and used it during spark-submit. spark-submit \ --class CLASS_NAME \ --master yarn \ --deploy-mode cluster \ --num-executors…

java apache-spark hadoop elasticsearch spark-submit

asked Aug 13 '18 at 16:55

pkgajulapalli

1,066
3
20
44

1

vote

0 answers

Nifi Job to execute a spark submit command not giving correct results

I have a spark code that appends data from a hive table to parquet files partitioned on dates. The code runs absolutely correct when executed from the spark shell and the parquet files show the exact same number of rows as present in the hive table…

apache-nifi spark-submit

asked Aug 07 '18 at 14:00

Praveen Sharma

11
4

1

vote

1 answer

spark-submit with Mahout error on cluster mode (Scala/java)

I'm trying to build a basic recomender with Spark and Mahout on Scala. I use the follow mahout repo to compile mahout with scala 2.11 and spark 2.1.2 mahout_fork To execute my code I use spark-submit and it run fine when I put --master local but…

scala apache-spark mahout mahout-recommender spark-submit

asked Aug 01 '18 at 12:27

Eduardo Liendo

131
7

1

vote

1 answer

Where to set "spark.yarn.executor.memoryOverhead"

I am getting following error while running my spark-scala program. YarnSchedulerBackends$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 2.6GB of 2.5GB physical memory used. Consider boosting…

apache-spark spark-submit

asked Jul 28 '18 at 07:44

Don Sam

525
5
20

1

vote

0 answers

Spark job creating only 1 stage task when executed

I am trying to load data from DB2 to Hive using Spark 2.1.1. & Scala 2.11. Code used is given below import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql import org.apache.spark.sql.SparkSession import…

scala apache-spark apache-spark-sql spark-submit hivecontext

asked Jul 27 '18 at 12:49

Amrutha K

204
1
3
13

Questions tagged [spark-submit]