Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
3
votes
0 answers

Spark notebooks is quicker than executing a jar

I have finished some code in spark notebook, I tried to move it into a real project, and use sbt to generate a jar, then use the spark-submit to execute it. Problem: It takes just 10 minutes to get the result in spark notebooks, but it takes almost…
Leyla Lee
  • 466
  • 5
  • 19
3
votes
0 answers

Spark Metrics not viewable in VisualVM

I am trying see the spark metrics after configuring the metrics.properties file. This is the command I am using for spark submit - /home/spark/spark/bin/spark-submit --class SparkRunner --master spark://x.x.x.x:7077 --files…
Shitij Goyal
  • 191
  • 1
  • 8
3
votes
2 answers

Can't connect with Mongo-Spark Connector using Mongo in Authentication mode

I'm trying to run a spark-submit job, using a MongoDB instance on a remote machine, via the Mongo-Spark Connector. When I initiate the mongod service without the --auth flag, and run the spark-submit command like this: ./bin/spark-submit --master…
3
votes
0 answers

JVM options not taken into consideration, spark-submit of java program

Using spark-submit I'm launching a java program. However I'm setting java arguments for the JVM that are not taken into account. I'm trying to specify the max/min heap free ratio. However, even though the arguments are present (based on visualvm),…
Majid
  • 654
  • 2
  • 7
  • 28
3
votes
2 answers

Value split is not a member of (String, String)

I am trying to read data from Kafka and Storing into Cassandra tables through Spark RDD's. Getting error while compiling the code: /root/cassandra-count/src/main/scala/KafkaSparkCassandra.scala:69: value split is not a member of (String,…
3
votes
1 answer

Read input file from jar while running application from spark-submit

I have an input file that is custom delimited and is passed to newAPIHadoopFile to convert as RDD[String]. The file resides under the project resource directory. The following code works well when run from the Eclipse IDE. val path =…
user1384205
  • 1,231
  • 3
  • 20
  • 39
3
votes
0 answers

Import module doesn't work after zipping python dependencies for spark-submit

I'm new to Spark world and I'm trying to launch some tests on Amazon EMR clusters using Spark 2.1.0 and Python 3.5. In order to do this I created a virtual environment with conda and zipped the site-packages with all the dependencies I need to…
3
votes
2 answers

SLF4J : simplelogger.properties in the project not detected

I am using Grizzled-SLF4J (a wrapper around SLF4J) for my Spark/Scala/SBT Project. The property file simplelogger.properties has been placed in the src/main/resources. But the property file is not getting detected when I run the Application using…
Raj
  • 2,368
  • 6
  • 34
  • 52
3
votes
0 answers

import external modules in spark python

I have a EMR job with pyspark application. My Code includes some external packages and some files for look up. This is the hierarchy of file system when i tried the same in local box. [1] Wordcount.py -> spark file [2] Temp.py -> external…
Abu Tahir
  • 362
  • 3
  • 16
3
votes
1 answer

"The filename, directory name, or volume label syntax is incorrect." while using spark-submit

I am using spark-submit to execute a jar file. Spark is located in my "C" drive and my eclipse workspace is in "D" drive. Tough I am giving an absolute path for the jar file I get the error saying "The filename, directory name, or volume label…
3
votes
0 answers

Spark-submit python logging in executor

I am using Python to implement spark jobs. We wanted to get the python logging output from the application into Spark history server. So we used the method outlined here: PySpark logging from the executor However the problem is that, since the…
feroze
  • 7,380
  • 7
  • 40
  • 57
2
votes
0 answers

Rest to equivalent unix command for dataproc job submit spark

I have configuration and cluster set in GCP and i can submit a spark job, but I am trying to run cloud dataproc job submit spark from my CLI for the same configuration. I've set the service account in my local, I am just unable to build the…
2
votes
1 answer

EMR Spark deploy mode when using Docker

I am deploying a spark job in AWS EMR and packaging all my dependencies using docker. My pythonized spark submit command looks like this ... cmd = ( f"spark-submit --deploy-mode cluster " f"spark-submit --deploy-mode…
2
votes
0 answers

ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. - Spark standalone cluster

Spark job (Scala/s3) worked fine for few runs in stand-alone cluster with spark-submit but after few run it started giving the below error. There were no changes to code, it is making connection to spark-master but immediately application is getting…
2
votes
0 answers

Why am I getting TransportRequestHandler: Error while executing a jar file with spark-submit?

I am trying to run a simple word counter example on a 3 node cluster, one with master-worker, and 2 being sole worker nodes. When I execute the "spark-submit CountWord.jar" in terminal I get the error saying TransportRequestHandler: Error...closing…