Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
1
vote
1 answer

Scheduling spark job which inserts data in postgresdb in airflow env

I want to schedule spark write operation to postgres db. I have attached my code below. My airflow task instance triggers before hour. What can I do to make it run at exact an hour with only one task instance per dag run df = spark.read…
1
vote
0 answers

How do I execute spark submit job with python3/python2 mypthon3.zip compiled zip on databricks DRE 7.6 and above?

I am trying to execute spark submit job with python3 and python2 compiled mypthon3.zip on Databricks DRE 7.6 and above and getting below error Traceback (most recent call last): File "/dbfs/tmp/WT_SPARK3/Inputsql.py", line 1, in from…
1
vote
0 answers

Pyspark submit py-files as zip: java.io.FileNotFoundException

I am submitting a Pyspark job with all modules packaged in a zip file like so: $SPARK_HOME/bin/spark-submit \ --master local[*] \ --deploy-mode client \ --name spark-python \ --conf spark.driver.memory=4g \ --files…
Lorenz
  • 123
  • 1
  • 9
1
vote
0 answers

spark.driver.extraLibraryPath override original library path

I have a spark job running on AWS EMR cluster, it need access native lib(*.so), per spark's document (https://spark.apache.org/docs/2.3.0/configuration.html) I need add "spark.driver.extraLibraryPath" and "spark.executor.extraLibraryPath" options in…
1
vote
1 answer

Set spark context configuration prioritizing spark-submit

I'm building a dockerized spark application, which will be run through an entrypoint.sh file which in turn runs a spark-submit: #!/bin/bash export SPARK_DIST_CLASSPATH=$(hadoop classpath):$HADOOP_HOME/share/hadoop/* export _JAVA_OPTIONS="-Xms2g…
yatu
  • 86,083
  • 12
  • 84
  • 139
1
vote
1 answer

spark submit to minikube error related to krb5.conf

I'm following this file to spark-submit to minikube : https://gist.github.com/jjstill/8099669931cdfbb90ce6f4c307971514 This is my modified version called spark-minikube.sh: minikube --memory 8192 --cpus 3 start kubectl create namespace…
user6308605
  • 693
  • 8
  • 26
1
vote
0 answers

How to share state across multiple spark submit jobs?

I am fairly new to spark and was exploring the topic of submitting spark jobs to a cluster. As per my understanding, every spark-submit job is a separate application in itself. As per our requirement, we need to access tables created by a spark…
1
vote
0 answers

Create coverage xml withspark submit

Require to generate coverage xml file with spark submit command through command line. I can generate coverage xml for python code using below commands coverage run -p main.py coverage combine coverage xml Now instead of directly running main.py i…
isha
  • 143
  • 1
  • 10
1
vote
1 answer

pyspark container- spark-submitting a pyspark script throws file not found error

Solution- Add following env variables to the container export PYSPARK_PYTHON=/usr/bin/python3.9 export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.9 Trying to create a spark container and spark-submit a pyspark script. I am able to create the…
coredump
  • 107
  • 9
1
vote
0 answers

NoSuchMethodError: com.google.common.hash.Hashing.crc32c

I'm trying to access Google cloud storage from my spark code. But I'm getting the following error on creating a file in GCS. java.lang.NoSuchMethodError: com.google.common.hash.Hashing.crc32c()Lcom/google/common/hash/HashFunction; at…
pkgajulapalli
  • 1,066
  • 3
  • 20
  • 44
1
vote
4 answers

Error: Failed to Load Class main using Spark-submit

My Code is below import org.apache.spark.SparkContext; import org.apache.spark.SparkConf; object WordCounter { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Word Counter").setMaster("local") val sc = new…
1
vote
1 answer

KubernetesPodOperator Not Sending Arguments as expected

I have airflow running KubernetesPodOperator in order to do a Spark-submit call: spark_image = f'{getenv("REGISTRY")}/myApp:{getenv("TAG")}' j2g = KubernetesPodOperator( dag=dag, task_id='myApp', name='myApp', namespace='data', …
Shkolar
  • 337
  • 7
  • 20
1
vote
1 answer

Deploy Specific SPARK Version On a Cluster

On my current project, I tried to deploy a 2.2 version of SPARK where on the cluster a 2.1 version is available. I looked in the SPARK documentation the way to deploy specific dependencies on a cluster which led me to use the following…
1
vote
1 answer

Path of jars added to a Spark Job - spark-submit

I am using Spark 2.1 (BTW) on a YARN cluster. I am trying to upload JAR on YARN cluster, and to use them to replace on-site (alreading in-place) Spark JAR. I am trying to do so through spark-submit. The question Add jars to a Spark Job -…
dmdevito
  • 51
  • 3
1
vote
0 answers

Docker container on EMR

I am trying to run my python container on emr with a main.py, using spark-submit --master yarn --deploy-mode cluster --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf…