Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
1
vote
1 answer

How to process mainframe numbers where "{" is the last character

I have a one mainframe file data like as below 000000720000{ I need to parse the data and load into a hive table like below 72000 the above field is income column and "{" sign which denotes +ve amount datatype used while creating table income…
1
vote
1 answer

Error while running spark job because of the native files missing

I was getting this error java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support. while running a spark submit job. What I did was copied libhadoop.so and libsnappy.so inside…
John Humanyun
  • 915
  • 3
  • 10
  • 25
1
vote
0 answers

How to clear driver and executor memory blocked data due to broadcast variable in spark scala during spark-submit job execution

I have series of spark jobs execution flow in single class as shown below: SparkMainClass: Job1 (Using 4 new dataframes in broadcast join) Job2 (Using 3 new dataframes in broadcast join) Job3 (Using 4 new dataframes in broadcast join) Job4 (Using 2…
Ku002
  • 117
  • 1
  • 2
  • 14
1
vote
1 answer

Spark requests for more core than asked when calling POST livy batch api in azure synapse

I have an azure synapse spark cluster with 3 nodes of 4 vCores and 32 GB memory each. I am trying to submit a spark job using azure synapse Livy batch APIs. The request looks like this, curl --location --request POST…
1
vote
0 answers

How to manage multiple environments in pyspark clusters?

I want to: Have multiple python environments in my pyspark dataproc cluster Specify while submitting the job which environment I want to execute my submitted job in I want to persist the environments so that I can use them on an as-needed basis. I…
figs_and_nuts
  • 4,870
  • 2
  • 31
  • 56
1
vote
1 answer

spark-submit throws Exception in thread "main" java.lang.IllegalStateException: Cannot find any build directories

Spark-submit command: [root@d03db3cedc5a opt]# bash -x $SPARK_HOME/bin/spark-submit --master spark://analytics-seed:7077 --py-files $SPARK_HOME/hello_world.py + '[' -z /opt/spark ']' + export PYTHONHASHSEED=0 + PYTHONHASHSEED=0 + exec…
1
vote
0 answers

SparkSubmitOperator with Multiple .jars

I am trying to write a data pipeline that reads a .tsv file from Azure Blob Storage and write the data to a MySQL database. I have a sensor that looks for a file with a given prefix within my storage container and then a SparkSubmitOperator which…
Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35
1
vote
1 answer

Pyspark - MetadataFetchFailedException when calculating tf - idf

I am working on a dataset of initially 569 MB, calculating the TF-IDF metric. Although I am getting results in the end I keep getting the below error: WARN scheduler.TaskSetManager: Lost task 13.0 in stage 11.0 (TID 84, X.X.X.X, executor 0):…
user2829319
  • 239
  • 4
  • 16
1
vote
1 answer

Pass Typesafe config file to the Spark Submit Job in Azure Databricks

I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file. import org.slf4j.{Logger, LoggerFactory} import com.typesafe.config.{Config, ConfigFactory} import org.apache.spark.sql.SparkSession …
1
vote
1 answer

Spark on YARN - Cannot allocate containers as requested resource is greater than maximum allowed allocation

Error : YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details. 2021-10-12 15:15:30,201 Diagnostics message: Uncaught exception:…
1
vote
0 answers

Can't spark-submit jar file from hdfs on remote machine

I'm trying to connect to my remote cluster using spark-submit and run a jar file that I've put on hdfs. I have the following property in my $SPARK_HOME/libexec/conf/core-site.xml, which is also in $HADOOP_HOME/libexec/etc/hadoop/:
steven hurwitt
  • 183
  • 2
  • 15
1
vote
0 answers

spark-submit on AWS EMR Throws Exception

I run the following command on the master node of an AWS EMR cluster (release label: emr-6.1.0, hadoop distribution: Amazon 3.2.1) - % spark-submit \ --deploy-mode cluster \ --master yarn \ main.py It throws the following exception 21/10/01…
1
vote
0 answers

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;

I use mvn clean package to create a jar package and no error raised. After that I use spark-submit to execute the jar package, following error occurs: Exception in thread "main" java.lang.NoSuchMethodError:…
Yafei Wei
  • 19
  • 3
1
vote
1 answer

How to deal with error code 101 for Spark-submit on Kubernetes

I am trying to run the following code to submit a spark application to a kubernetes' cluster: /opt/spark/bin/spark-submit --master k8s://https://:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi…
Tyler Mc
  • 228
  • 1
  • 7
1
vote
0 answers

Spark submit failing with Python virtualenv

I am trying to run a python module using spark-submit on spark cluster. The package has certain dependencies which have been zipped in a virtualenv. I am using below command to run it. export PYSPARK_PYTHON=./environment/bin/python spark-submit…
Richa Gaur
  • 11
  • 3