Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
7
votes
0 answers

Run external python dependencies with spark-submit?

I have a test.py file import pandas as pd import numpy as np import tensorflow as tf from sklearn.externals import joblib import tqdm import time print("Successful import") I have followed this method to create independent zip of all…
user9316498
7
votes
1 answer

What causes "unknown resolver null" in Spark Kafka Connector?

I am new to spark, I have start zookeeper, kafka(0.10.1.1) on my local, also spark standalone(2.2.0) with one master and 2 workers. my local scal version is 2.12.3 I was able to run wordcount on spark, and using kafka console producer and consumer…
7
votes
4 answers

Best practice to create SparkSession object in Scala to use both in unittest and spark-submit

I have tried to write a transform method from DataFrame to DataFrame. And I also want to test it by scalatest. As you know, in Spark 2.x with Scala API, you can create SparkSession object as follows: import org.apache.spark.sql.SparkSession val…
Joo-Won Jung
  • 151
  • 1
  • 2
  • 6
7
votes
3 answers

NoClassDefFoundError: Could not initialize XXX class after deploying on spark standalone cluster

I wrote a spark streaming application built with sbt. It works perfectly fine locally, but after deploying on the cluster, it complains about a class I wrote which clearly in the fat jar (checked using jar tvf). The following is my project…
Dr.Pro
  • 213
  • 5
  • 11
7
votes
1 answer

How to spark-submit a python file in spark 2.1.0?

I am currently running spark 2.1.0. I have worked most of the time in PYSPARK shell, but I need to spark-submit a python file(similar to spark-submit jar in java) . How do you do that in python?
Kalyan
  • 1,880
  • 11
  • 35
  • 62
7
votes
3 answers

Apache Spark -- using spark-submit throws a NoSuchMethodError

To submit a Spark application to a cluster, their documentation notes: To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark…
user4728253
6
votes
1 answer

Submitting pyspark job with multiple python files and one configuration file

I have 4 python scripts and one configuration file of .txt . out of 4 python files , one file has entry point for spark application and also importing functions from other python files . But configuration file is imported in some other python file…
Jay
  • 296
  • 10
  • 25
6
votes
1 answer

Task is running on only one executor in spark

I am running below code in spark using Java. Code Test.java package com.sample; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import…
A Learner
  • 157
  • 1
  • 5
  • 16
6
votes
3 answers

Can num-executors override dynamic allocation in spark-submit

Can specifying num-executors in spark-submit command override alreay enabled dynamic allocation (spark.dynamicAllocation.enable true) ?
Arvind Kumar
  • 1,325
  • 1
  • 19
  • 27
6
votes
3 answers

Failed to submit local jar to spark cluster: java.nio.file.NoSuchFileException

~/spark/spark-2.1.1-bin-hadoop2.7/bin$ ./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar Running Spark using the REST application submission…
BAE
  • 8,550
  • 22
  • 88
  • 171
6
votes
2 answers

How to set Spark application exit status?

I'm writing a spark application and run it using spark-submit shell script (using yarn-cluster/yarn-client) As I see now, exit code of spark-submit is decided according to the related yarn application - if SUCCEEDED status is 0, otherwise 1. I want…
roh
  • 123
  • 1
  • 1
  • 10
5
votes
0 answers

Databricks PySpark with PEX: how can I configure a PySpark job on Databricks using PEX for dependencies?

I am attempting to create a PySpark job via the Databricks UI (with spark-submit) using the spark-submit parameters below (dependencies are on the PEX file), but I am getting an exception that the PEX file does not exist. It's my understanding that…
r_g_s_
  • 224
  • 1
  • 8
5
votes
4 answers

PySpark packages installation on kubernetes with Spark-Submit: ivy-cache file not found error

I am fighting it the whole day. I am able to install and to use a package (graphframes) with spark shell or a connected Jupiter notebook, but I would like to move it to the kubernetes based spark environment with spark-submit. My spark version:…
kostjaigin
  • 125
  • 2
  • 8
5
votes
2 answers

Spark on EMR-5.32.0 not spawning requested executors

I am running into some problems in (Py)Spark on EMR (release 5.32.0). Approximately a year ago I ran the same program on an EMR cluster (I think the release must have been 5.29.0). Then I was able to configure my PySpark program using spark-submit…
5
votes
0 answers

Execute spark/Scala jar using spark-submit vs execute spark jar using java -jar

I came across with an interesting question that, the different methods of submitting the spark application from windows development environment. Generally, we submit spark job using spark-submit and also we can execute uber jar (dependent spark…
Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68
1
2
3
40 41