Questions tagged [spark-submit]

spark-submit is a script that is able to run apache-spark code written in e.g. java, scala or python

More information about spark-submit can be found here.

611 questions
1
vote
1 answer

How to export a Datastax graph based on a specific traversal using DseGraphFrame

I would like to export a DSE graph via a spark job , as per https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/graphAnalytics/dseGraphFrameExport.html All this works fine within the spark-shell , I want to be doing this in Java…
zXor
  • 208
  • 1
  • 10
1
vote
1 answer

run spark job using databricks Resr API

I am using databricks rest API to run spark jobs. I am using the foollowing commands: curl -X POST -H "Authorization: XXXX" 'url/api/2.0/jobs/create' -d ' {"name":"jobname","existing_cluster_id":"0725-095337-jello70","libraries": [{"jar":…
scalacode
  • 1,096
  • 1
  • 16
  • 38
1
vote
1 answer

Where does spark job run in a cluster of 2 nodes, but the spark submit configurations can easily accommodate in a single node? (cluster mode)

spark cluster has 2 worker nodes. Node 1: 64 GB, 8 cores. Node 2: 64 GB, 8 cores. Now if i submit a spark job using spark-submit in cluster mode with 2 executors and each executor memory as 32 GB, 4 cores/executor. Now my question is, as the above…
1
vote
2 answers

How to pass external resouce yml /property file while running spark job on cluster?

I am using spark-sql 2.4.1 version, jackson jars & Java 8. In my spark program/job I am reading few configurations/properties from external "conditions.yml" file which is place in "resource" folder of my Java Project as below ObjectMapper mapper =…
1
vote
0 answers

How to get the parameters I pass in spark-submit command in python

I am using spark-submit command to run a python code using pyspark. Something like: spark-submit --master yarn --num-executors 15 --executor-cores 6 test.py Is there any way I can get the parameters I am using in the spark-submit command in python…
Dan R
  • 71
  • 7
1
vote
1 answer

How to spark-submit .py file stored in GCP bucket?

I am trying to run this file .py file. I have copied the dsgd_mf.py file in the GCP bucket. The input datafile required is also in my bucket. how to spark-submit this and get output?…
1
vote
0 answers

how --py-files works internally in pyspark

I am new to pySpark. I have used --py-files like below in spark-submit command to copy all files to worker nodes. spark-submit --master yarn-client --driver-memory 4g --py-files /home/valli/pyFiles.zip /home/valli/main.py In logs I observed that…
Valli69
  • 8,856
  • 4
  • 23
  • 29
1
vote
1 answer

spark-submit works for yarn-cluster mode but SparkLauncher doesn't, with same params

I'm able to submit a spark job through spark-submit however when I try to do the same programatically using SparkLauncher, it gives me nothing ( I dont even see a Spark job on the UI) Below is the scenario: I've a server(say hostname:…
ni_i_ru_sama
  • 304
  • 1
  • 13
1
vote
2 answers

Spark 2.4.0 submit in cluster mode - why is REST submission server required

I have a standalone spark 2.4.0 cluster to which I need to deploy app passing some extra java options (to both driver and executors). To do that I use spark.driver.extraJavaOptions and spark.executor.extraJavaOptions described here. It works…
1
vote
2 answers

What is the difference between defining Spark Master in the CLI vs defining 'master' in the Spark application code?

What is the difference between Spark-submit "--master" defined in the CLI and spark application code, defining the master? In Spark we can specify the master URI in either the application code like below: Or we can specify the master URI in the…
1
vote
1 answer

Passing multiple system properties to spark-submit

I am trying to run a Spark job using spark-submit in Windows. I am executing the below spark-submit command from command prompt. spark-submit --driver-class-path %FILE_NAME%\config --files…
Anand
  • 20,708
  • 48
  • 131
  • 198
1
vote
1 answer

spark 2.4 com.databricks.spark.avro trouble-shooting

I have a spark-job, that I usually submit to a hadoop cluster from a local machine. When I submit it with spark 2.2.0 it works fine, but fails to start when i submit it with version 2.4.0. Just the the SPARK_HOME makes the difference. drwxr-xr-x 18…
Antalagor
  • 428
  • 4
  • 10
1
vote
1 answer

spark submit - An existing connection was forcibly closed by the remote host [on master node ]

I have setup a spark cluster on my windows 7 machine locally. It has a master and a worker node. I have created a simple jar using sbt compile + sbt package and trying to submit it to the spark master node using spark-submit. Currently both the…
ankur
  • 557
  • 1
  • 10
  • 37
1
vote
0 answers

pyspark setup with jupyter notebook

I am relatively new to using pyspark and have inherited a data pipeline built in spark. There is a main server that I connect to and execute via terminal the spark job using spark-submit, which then executes via master yarn via cluster deploy…
zad0xlik
  • 183
  • 1
  • 4
  • 14
1
vote
1 answer

spark-submit: 403 error, client System:anonymous error

From time to time, when I submit spark job to Google Kubernetes cluster, I got 401 unauthorized, so I do this gcloud container clusters get-credential my-cluster, but it almost always followed by 403 error, saying client system:anonymous etc., but…