Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
2
votes
1 answer

Can run code in pyspark shell but the same code fails when submitted with spark-submit

I am a spark amateur as you will notice in the question. I am trying to run very basic code on a spark cluster. (created on dataproc) I SSH into the master Create a pyspark shell with pyspark --master yarn and run the code - Success Run the…
2
votes
1 answer

How are spark jobs submitted in cluster mode?

I know there is information worth 10 google pages on this but, all of them tell me to just put --master yarn in the spark-submit command. But, in cluster mode, how can my local laptop even know what that means? Let us say I have my laptop and a…
2
votes
1 answer

How to pass env variables in dataproc submit command?

I want to be able to set the following env variables while submitting a job via dataproc submit: SPARK_HOME PYSPARK_PYTHON SPARK_CONF_DIR HADOOP_CONF_DIR How can I achieve that?
2
votes
0 answers

Is this an error that occurred after the spark operation?

I ran the following command: $ spark-submit --master yarn --deploy-mode cluster pi.py So, below log is continuous print: ... 2021-12-23 06:07:50,158 INFO yarn.Client: Application report for application_1640239254568_0002 (state:…
SecY
  • 307
  • 4
  • 12
2
votes
1 answer

How do I use JARs stored in Artifactory in spark submits?

I am trying to configure the spark-submits to use JARs that are stored in artifactory. I've tried a few ways to do this Attempt 1: Changing the --jars parameter to point to the https end point Result 1: 401 Error. Credentials are being passed like…
Mark Wan
  • 21
  • 1
2
votes
0 answers

spark-submit on Openshift to use specific Worker nodes

I am trying to , spark-submit on Openshift to use specific Worker nodes. below is my command. ./spark/bin/spark-submit \ --master xx:6443 \ --deploy-mode cluster \ --name \ --class com.xxx \ --conf spark.executor.instances=2 \ --conf…
2
votes
1 answer

How to skip Airflow SparkSubmitOperator task based on exit code that my Spark programm returns?

My Spark submit application is doing some query and returning different exit code depends on the dataset state. Is it possible to skip downstream tasks right after my spark-submit operator? I am thinking about skip_exit_code feature of BashOperator,…
Alexey Novakov
  • 744
  • 6
  • 20
2
votes
1 answer

Why driver memory is not in my Spark context configuration?

When I run the following command: spark-submit --name "My app" --master "local[*]" --py-files main.py --driver-memory 12g --executor-memory 12g With the following code in my main.py: sc =…
user10800954
2
votes
0 answers

Import error occurs while using pyspark udf

I'm trying to run spark application using spark-submit. I've created the followig udf: from pyspark.sql.functions import udf from pyspark.sql.types import StringType from tldextract import tldextract @udf(StringType()) def get_domain(url): ext…
2
votes
0 answers

submit job with pandas in a zip file

I have two libraries: Pandas and utils (my library), and I want to import in my code. Since I was testing Pandas does not work as well. Using boto3 and requests (without being preinstalled in the cluster) it works creating two zip files: libs.zip:…
2
votes
1 answer

How to access the kubectl forwarded port on Spark Kubernetes cluster from spark-submit?

I have a spark cluster running on the inhouse-kubernetes cluster(managed with Rancher). Our company and the configuration of the cluster doesn't allow the services to be accessed from the: spark://SERVICE_NAME.namespace.svc.domain..... We created…
Fahad Rana
  • 70
  • 1
  • 2
  • 16
2
votes
1 answer

FileNotFoundException on submitting Spark Jobs to remote

I've created an environment where I've set up 3 Docker containers, 1 for Airflow using the puckel/docker-airflow image with spark and hadoop additionally installed. The other two containers are basically imitating spark master and worker (used…
2
votes
0 answers

How to create a python egg? and spark submit it

I must be doing a wrong assumption because I don't find a way to solve this. I want to spark-submit a .egg file, that should be: spark-submit --py-files mypkg.egg main.py argv1 argv2 , that only needs the .egg file. But when I execute this, I get: …
2
votes
2 answers

How should you run a jupyter notebook on Spark EMR Cluster

EDIT: This question was on how you should define parameters for python/jupyetr-notebook file in order to make a spark-submit on an EMR Amazon Spark Cluster... Before: I am sorry for my dumb questions, but I am pretty newbie and I am stuck on the…
2
votes
0 answers

How to resolve "Spark-Submit Error: Failed to load class" issue?

I am new to Spark / Scala, I tried to execute some sample scala program and facing some issue on this. Please find the steps I tried so far, I have created the scala class (object) Sivam under the package com.dhana.MyScalaDemo Step 1: Started Master…
Dhanabalan
  • 572
  • 5
  • 19