Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags:

164 questions
2
votes
2 answers

spark scala: Performance degrade with simple UDF over large number of columns

I have a dataframe with 100 million rows and ~ 10,000 columns. The columns are of two types, standard (C_i) followed by dynamic (X_i). This dataframe was obtained after some processing, and the performance was fast. Now only 2 steps remain: Goal: A…
Quiescent
  • 1,088
  • 7
  • 18
2
votes
1 answer

Spark not giving equal tasks to all executors

I am reading from kafka topic which has 5 partitions. Since 5 cores are not sufficient to handle the load, I am doing repartitioning the input to 30. I have given 30 cores to my spark process with 6 cores on each executor. With this setup i was…
2
votes
0 answers

ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. - Spark standalone cluster

Spark job (Scala/s3) worked fine for few runs in stand-alone cluster with spark-submit but after few run it started giving the below error. There were no changes to code, it is making connection to spark-master but immediately application is getting…
2
votes
0 answers

Shuffle files cleanup in Spark with externalShuffle service

We are using Spark 3.0.1 (standalone mode) with dynamic allocation and external shuffle service. After switching to dedicated persistent disks we started getting "out of disk space errors", so we looked into the /tmp folder and noticed many older…
LiranBo
  • 2,054
  • 2
  • 23
  • 39
2
votes
1 answer

Spark Standalone : how to avoid sbt assembly and uber-jar?

I have sbt.build like that, to do Spark programming : libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "3.0.1" withSources(), "com.datastax.spark" %% "spark-cassandra-connector" % "3.0.0" withSources() ... ) As my program use…
Klun
  • 78
  • 2
  • 25
2
votes
0 answers

How to force Spark 2.4.3 to use fixed ports for executors or could I ignore the issue that Spark uses random executor ports?

I am using Spark 2.4.3 on five nodes in client mode and standalone mode for testing purposes and I am assigned to a limited range of ports. Hence I have configured all ports which are possible according to the docs to avoid that Spark takes arbitry…
2
votes
1 answer

In spark standalone mode is the master and executors located in a single machine?

Does spark standalone mode means that the executors and master are run on a single machine?If yes how can it attend parallelism. Is the value passed to set local function of spark conf set as one in standalone mode to indicate that spark…
user4473195
2
votes
1 answer

Unit Tests using Spark Session : SparkContext was shut down

We have a big project with multiple tests suites and every test suite has in average 3 tests. For our unit tests, we use the Spark Standalone, and so no Yarn as a resource manager. Every test suite : initiliazes spark session : implicit val spark…
Farah
  • 2,469
  • 5
  • 31
  • 52
2
votes
0 answers

Spark master continuously launch executors for non-existing driver

Spark application is deployed in Standalone cluster mode with supervise enabled. During high availability testing , when a rack with driver instance is powered off (ungracefully) , spark master didn't know about the killed driver and application and…
eprabab
  • 101
  • 6
2
votes
1 answer

spark.master configuration via REST job submission in standalone cluster is ignored

I have a Standalone spark cluster in HA mode (2 masters) and couple of workers registered there. I submitted the spark job via REST interface with following details, { "sparkProperties": { "spark.app.name": "TeraGen3", …
kans
  • 23
  • 3
2
votes
1 answer

Simple spark job fail due to GC overhead limit

I've created a standalone spark (2.1.1) cluster on my local machines with 9 cores / 80G each machine (total of 27 cores / 240G Ram) I've got a sample spark job that sum all the numbers from 1 to x this is the code : package com.example import…
Y. Eliash
  • 1,808
  • 3
  • 14
  • 23
2
votes
1 answer

spark-shell on multinode spark cluster fails to spon executor on remote worker node

Installed spark cluster on standalone mode with 2 nodes on first node there is spark master running and on another node spark worker. When i try to run spark shell on worker node with word count code it runs fine but when i try to run spark shell…
2
votes
0 answers

How to change the state of a Spark application

All my "Completed Applications" are in a FINISHED "State" in the Spark UI. Even when my "Completed Driver" is in a FAILED "State" (because in the scala code I do a System.exit(1) in case of any exceptions), its associated "Completed Application" is…
reno
  • 21
  • 2
2
votes
1 answer

Executor unable to pick postgres driver in Spark standalone cluster

I was submitting the play application to spark 2.1 standalone cluster . In play application postgres dependency is also added and application works on local spark libraries. But at run time on standalone cluster it gives me error :…
2
votes
2 answers

Spark: Job restart and retries

Suppose you have Spark + Standalone cluster manager. You opened spark session with some configs and want to launch SomeSparkJob 40 times in parallel with different arguments. Questions How to set reties amount on job failures? How to restart jobs…
VB_
  • 45,112
  • 42
  • 145
  • 293