Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
1
vote
1 answer

I am relatively new to Airflow and spark and want to use the Kubernetes Operator in airflow Dag to run a Spark Submit command

I am using the kubernetes version 1.25 client and server, I have deployed Airflow using the official helm charts on the environment. I want the Airflow dags kubernetes pod operator that has code to trigger the spark-submit operation to spawn the…
1
vote
0 answers

Restrict executer and driver, memory and core parameter in spark-submit on spark only cluster

We have a spark only cluster for multiple users, in spark-default.conf have below property set spark.driver.memory 2g spark.executor.cores 1 spark.executor.memory 2g As we have a multiple users, i dont want users to pass below parameters in…
1
vote
1 answer

ModuleNotFoundError: No module named X when using foreach function with PySpark

I currently encounter an error when using an external Python module (orjson) inside foreach function with Pyspark. Everything was fine if I use that module outside foreach function (collect() method). Below is my simple code from pyspark.sql import…
1
vote
1 answer

ClassNotFoundException: org.apache.beam.runners.spark.io.SourceRDD$SourcePartition during spark submit

I use spark-submit to spark standalone cluster to execute my shaded jar, however the executor gets error: 22/12/06 15:21:25 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1) (10.37.2.77, executor 0, partition 0, PROCESS_LOCAL, 5133 bytes)…
1
vote
0 answers

spark-submit error - "Local jar does not exists, skipping"

Spark: 3.3.0 (spark-3.3.0-bin-hadoop3) I am trying to run spark-submit using jar files, but I receive "jar does not exist" from DependencyUtils, even if the JAR is in place. As a consequence the mainClass is not found. Command: ./spark-submit…
1
vote
0 answers

Standard way to store/upload application jar on Spark cluster on Kubernetes

I have a Spark based Kubernetes cluster where I am using spark-submit to submit the jobs on cluster as needed. e.g. spark-submit \ --master spark://my-spark-master-svc:7077 \ --class com.Main \ …
adesai
  • 370
  • 3
  • 22
1
vote
1 answer

spark-submit error loading class with fatjar on macOS

I am trying to run a simple hello world spark application This is my code package com.sd.proj.executables import org.apache.spark.sql.functions.lit import org.apache.spark.sql.{DataFrame, SparkSession} class SparkConn { def…
dsam05
  • 31
  • 2
1
vote
2 answers

Spark configuration based on my data size

I know there's a way to configure a Spark Application based in your cluster resources ("Executor memory" and "number of Executor" and "executor cores") I'm wondering if exist a way to do it considering the data input size? What would happen if data…
1
vote
1 answer

Spark cluster on Kubernetes without spark-submit

I have a spark application and want to deploy this on a Kubernetes cluster. Following the below documentation I have managed to create an empty Kubernetes cluster, generated docker image using the Dockerfile provided under…
adesai
  • 370
  • 3
  • 22
1
vote
0 answers

spark-submit DependencyUtils: Local jar does not exists, skipping

Context: Windows 10/Linux Ubuntu 8 LTS Spark: 3.3.0 (spark-3.3.0-bin-hadoop3) I'm running spark-submit with a fat jar, but I receive a "jar does not exists" from DependencyUtils, even if the JAR is in place. As a consequence the mainClass is not…
LorenzoGi
  • 256
  • 4
  • 14
1
vote
0 answers

PySpark Virtual Environment issue in spark-submit local mode

I am trying to run python program using spark-submit in local mode with the virtual environment for python and it is still running without failing even when pyspark is not installed in virtual env. details are below for what I have tried for…
1
vote
0 answers

Exit code of spark-submit is 0 even when spark k8s driver pod fails (cluster mode)

After executing spark-submit command in kubernetes in cluster mode ( --deploy-mode cluster ), it always give exit code as 0 (success) even when the driver pod has failed. Ideally, the main pod should fail (i.e. go to state 'Error') as well if the…
user
  • 383
  • 1
  • 5
  • 20
1
vote
1 answer

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources error

I am new in spark. I am going to start spark by using /opt/spark/bin/spark-submit --jars mariadb-java-client-3.0.5.jar --master spark://neem-2:7077 sparksql.py But I get this error and I stuck with this, I would really appreciate that if you can…
1
vote
0 answers

Exception in thread "main" org.apache.spark.SparkException: Driver cores must be a positive number

I have problem with submit task in Mesos with cluster mode, first I using this syntax to running cluster mode on Mesos $ cd spark $ ./sbin/start-mesos-dispatcher.sh --master mesos://10.2.3.95:5050 After that I submit task in Mesos using…
1
vote
0 answers

Running spark-submit with spark-avro installed locally on a Mac or PC

I am really struggling with this one. Spent a lot of time searching for an answer in Spark manual and stack-overflow posts. Really need help. I've installed Apache Spark on my mac to build and debug PySpark code locally. However, in my PySpark code…
bda
  • 372
  • 1
  • 7
  • 22