Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
2
votes
1 answer

Is it possible (and wise) to execute other "spark-submit" inside a JavaRDD?

I'm trying to execute a Spark program with spark-submit (in particular GATK Spark tools, so the command is not spark-submit, but something similar): this program accept an only input, so I'm trying to write some Java code in order to accept more…
Vzzarr
  • 4,600
  • 2
  • 43
  • 80
2
votes
1 answer

adding external property file to classpath in spark

I am currently submitting my fat jar to spark cluster using below command. Application fat jar and related configuration are in the folder /home/myapplication $SPARK_HOME/bin/spark-submit --jars $SPARK_HOME/lib/protobuf-java-2.5.0.jar --class…
chandu ram
  • 251
  • 2
  • 5
  • 19
2
votes
1 answer

spark load a s3a file as a df, which command to run?

I have a json file that is valid: I can successfully import it on a local spark machine DF = sqlContext.read.json("/home/me/myfile.json") I have a shell script to submit the job /home/me/spark/bin/spark-submit \ --master local[*] Code.py So far…
S12000
  • 3,345
  • 12
  • 35
  • 51
2
votes
1 answer

Spark submit: Table or view not found using jar

When I run HiveRead.java from intellij ide I can successfully run and get result. Then I created jar file ( It's a maven project ) , then I tried to run from IDE, it gave me ClassLoaderResolver for class "" gave error on creation : {1} Then I…
Saurab
  • 1,931
  • 5
  • 20
  • 33
2
votes
1 answer

Spark application override yarn-site.xml config parameters

I need to override one Yarn configuration parameter in yarn-site.xml when I submit a Spark application. Can I pass it as an extra param to spark-submit? The parameter I want to override is yarn.nodemanager.vmem-check-enabled
Sam
  • 11,799
  • 9
  • 49
  • 68
2
votes
1 answer

HiveContext - unable to access hbase table mapped in hive as external table

I am trying to access the hbase table mapped in hive using HiveContext in Spark. But I am getting ClassNotFoundException Exceptions.. Below is my code. import org.apache.spark.sql.hive.HiveContext val sqlContext = new HiveContext(sc) val df =…
user2731629
  • 402
  • 1
  • 7
  • 17
2
votes
3 answers

add a python external library in Pyspark

I'm using pyspark (1.6) and i want to use databricks:spark-csv library. For this i've tried different ways with no success 1- i've tried to add a jar i downloaded from https://spark-packages.org/package/databricks/spark-csv, and run pyspark --jars…
bhr
  • 31
  • 1
  • 3
2
votes
1 answer

can't add alluxio.security.login.username to spark-submit

I have a spark driver program which I'm trying to set the alluxio user for. I read this post: How to pass -D parameter or environment variable to Spark job? and although helpful, none of the methods in there seem to do the trick. My environment: -…
jb44
  • 393
  • 1
  • 6
  • 23
2
votes
0 answers

spark-submit: adding property file to driver class path

I need to put a properties file that my spark application uses in spark driver classpath. As per the documentation, it looks like --driver-class-path should do this but it didn't work for me. I tried following. (lets say /home/myuser/ is the…
Bijith Kumar
  • 181
  • 7
2
votes
0 answers

NullPointer Exception when submitting spark jobs via REST api

I am trying to build an application which submits spark jobs remotely and monitors for the status of the submitted job. I found http://arturmkrtchyan.com/apache-spark-hidden-rest-api which describes a REST API to submit jobs and fetch status.…
user_777
  • 21
  • 3
2
votes
3 answers

Spark + Kafka streaming NoClassDefFoundError kafka/serializer/StringDecoder

I'm trying to send message from my kafka producer and stream it in spark streaming. But I'm getting the following error when I run my application on spark submit. Error Exception in thread "main" java.lang.NoClassDefFoundError:…
Gaurav Ram
  • 1,085
  • 3
  • 16
  • 32
2
votes
2 answers

I am getting the below error while trying to execute spark submit using oozie on emr

I am running on cluster mode. The apacheds-kerberos-codec-2.0.0-M15.jar is present in multiple places in oozie/share/lib/lib*/spark and oozie/share/lib/lib*/oozie. Is this an environmental issue ? ava.lang.IllegalArgumentException: Attempt to add…
Rinin
  • 23
  • 1
  • 3
2
votes
0 answers

Yarn-Cluster mode - ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms

in my pyspark program i have as, from pyspark import SparkConf, SparkContext, SQLContex conf=SparkConf() conf.setAppName("spark_name") conf.set("spark.dynamicAllocation.enabled", "true") conf.set("spark.shuffle.service.enabled", "true") …
user491
  • 175
  • 1
  • 4
  • 20
1
vote
1 answer

Spark Job succeded in Airflow but no result seeing in Spark UI

I'm beginner with airflow and spark and I am currently setting up a data pipeline locally using Airflow and Spark. The DAG I want to do has just one task that runs a pyspark job on Spark. The dags folder of my application contains two…
KuRu
  • 31
  • 5
1
vote
0 answers

Spark submit error - cannot load main class from jar - PySpark

I'm running the below spark submit command, and got an error that says cannot load main class from jar file:/path/to/dependency.zip I'm struggling to understand why it looks for main class in the zip file, since I supplied the application.py, which…
user3735871
  • 527
  • 2
  • 14
  • 31