Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
0
votes
1 answer

How to pass configuration parameters from a file as environment variables for spark job?

Am running a spark application which will use configuration parameters from a file. File:- Spark.conf username=ankush password=ankush host=https:// port=22 outputDirectory=/home/ankush/data/ How to use this file at runtime. Instead of restarted the…
0
votes
0 answers

JSON Decoding error in spark2 cluster mode, works fine in client mode

Using spark-submit cmd(Spark2 CDH 5.9) to run a python script, I am getting the following json decoding error only in cluster mode (client is fine): e.g. Traceback (most recent call last): File "dummy.py", line 115, in cfg =…
trailblazer
  • 215
  • 1
  • 3
  • 10
0
votes
1 answer

Where does a Spark Driver run in cluster mode and can it be controlled?

While i have several documentation that suggest that a driver run on its own node which is the master and the executors on slave node also called Worker, I come to somehow get confused with that. Hence i would like to confirm the following if…
MaatDeamon
  • 9,532
  • 9
  • 60
  • 127
0
votes
1 answer

java.lang.NoClassDefFoundError: better/files/File in spark-sbumit for scala code

When i export the program into a jar file and execute it i get an java.lang.NoClassDefFoundError: better/files/File error. the code i'm using is below. Thanks in advance for any assistance SBT name := "testFunctions" version := "1.0" scalaVersion…
Jay
  • 23
  • 5
0
votes
1 answer

spark-submit : pass runtime variable to spark script

I am submiting Pyspark/SparkSQL script using spark-submit option and I need to pass runtime variables (database name) to script spark-submit command: spark-submit --conf database_parameter=my_database my_pyspark_script.py pyspark…
Shantanu Sharma
  • 3,661
  • 1
  • 18
  • 39
0
votes
1 answer

Remote spark-submit not working via paramiko

My spark program is in remote ubuntu system. Now I want to execute it from windows system using paramiko(a python package for SSH2 connections). The program in windows for remote execution of spark program The problem is that I can execute python…
gddxz
  • 35
  • 5
0
votes
2 answers

Force Python2 with spark-submit

I am creating a Spark application with AWS EMR but spark-submit runs with Python 3 instead of Python 2. But when I run pyspark instead, it is Python 2. How can I force spark-submit to use Python 2? I tried to do export…
Pierre
  • 938
  • 1
  • 15
  • 37
0
votes
1 answer

How do I add a Python module from inside conda's site-package directory to spark-submit?

I need to run a PySpark application (v1.6.3). There is the --py-files flag to add .zip, .egg, or .py files. If I had a Python package/module at /usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy, how would I include this whole module? Inside…
Jane Wayne
  • 8,205
  • 17
  • 75
  • 120
0
votes
1 answer

spark-submit error: ClassNotFoundException

build.sbt lazy val commonSettings = Seq( organization := "com.me", version := "0.1.0", scalaVersion := "2.11.0" ) lazy val counter = (project in file("counter")). settings(commonSettings:_*) counter/build.sbt name :=…
BAE
  • 8,550
  • 22
  • 88
  • 171
0
votes
3 answers

Runtime error on Scala Spark 2.0 code

I have the following code: import org.apache.spark.sql.SparkSession . . . val spark = SparkSession .builder() .appName("PTAMachineLearner") .getOrCreate() When it executes, I get the following…
Paul Reiners
  • 8,576
  • 33
  • 117
  • 202
0
votes
1 answer

Submit Spark jobs via User Interface

Is there any way to submit Spark jobs via YARN on some UI or even through IntelliJ way written apps. Best work arounds for company mode style of submitting jobs. We are using Apache Ambari where we installed Yarn, Hadoop, Spark. Ty :)
0
votes
2 answers

ibm bluemix spark submit

I'm new to Bluemix. I have created the Apache Spark Service and I tried to submit a simple hello-world jar through spark submit. (I used this link to fallow:…
user1271254
  • 1
  • 1
  • 2
0
votes
1 answer

Launching Spark job with Oozie fails (Error MetricsSystem)

I have a spark jar that I launch with spark-submit and it works fine (reading files, generate RDD, storing in hdfs). However, when I tried to launch the same jar within an Oozie job (oozie:spark-action) the spark job fails. When I looked the logs,…
OUMOUSS_ELMEHDI
  • 499
  • 5
  • 16
0
votes
1 answer

Spark submit from client machine

We have hadoop implemented on linux flatform. We use scala spark to develop models using spark machine learning libraries. I just use notepad++ and create *.scala file and execute them on data nodes. I want to know can I use eclipce or Intellij IDE…
0
votes
1 answer

Read text file in pyspark and sparksubmit

Assuming I run a python shell (file1.py) which take a text file as a parameter. that I run it as the following: python file1.py textfile1.txt Inside file1.py the following code from pyspark import SparkContext .... #I can read the file using the…
userInThisWorld
  • 1,361
  • 4
  • 18
  • 35