Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags:

164 questions
6
votes
3 answers

SparkUI not showing Tab (Jobs, Stages, Storage, Environment,...) when run in standalone mode

I'm running spark master through the following command: ./sbin/start-master.sh After that I went to http://localhost:8080, and I saw the following page. I was expecting to see the tab with Jobs, Environments, ... like the following Could someone…
6
votes
2 answers

How to add the "--deploy-mode cluster" option to my scala code

209/5000 Hello I want to add the option "--deploy-mode cluster" to my code scala: val sparkConf = new SparkConfig ().setMaster ("spark: //192.168.60.80:7077") Without using the shell (the command. \ Spark-submit) i whant to usage the "…
5
votes
3 answers

PySpark: Not able to create SparkSession.(Java Gateway Error)

I have installed PySpark on windows and was having no problem till yesterday. I am using windows 10, PySpark version 2.3.3(Pre-build version), java version "1.8.0_201". Yesterday when I tried creating a spark session, I ran into below…
Sanchit Kumar
  • 1,545
  • 1
  • 11
  • 19
5
votes
0 answers

Spark Streaming - Block replication policy issue in case of multiple executor on the same worker

I am running a spark streaming application on a cluster composed by three nodes, each node with a worker and three executors (so a total of 9 executors). I am using Spark version 2.3.2, and the spark standalone cluster manager. The…
5
votes
1 answer

Why is Apache Livy session showing Application id NULL?

I've implemented a fully functional Spark 2.1.1 Standalone cluster, where I POST job batches via the curl command using Apache Livy 0.4. When consulting the Spark WEB UI I see my job along with its application id (something like this:…
5
votes
1 answer

How to set up cluster environment for Spark applications on Windows machines?

I have been developing in pyspark with spark standalone non-cluster mode. These days, I would like to explore more on the cluster mode of spark. I searched on the internet, and found I may need a cluster manager to run clusters in different machines…
Yohan Chung
  • 519
  • 1
  • 6
  • 15
5
votes
2 answers

how to run Spark job on specific nodes

For example my Spark cluster has 100 nodes(workers), when I run one job I just want it be ran on some 10 specific nodes, how should I achieve this. btw, I'm using Spark standalone module. Why Do I need the above requirement: One of my Spark job…
Jack
  • 5,540
  • 13
  • 65
  • 113
5
votes
2 answers

winutils spark windows installation env_variable

I am trying to install Spark 1.6.1 on windows 10 and so far I have done the following... Downloaded spark 1.6.1, unpacked to some directory and then set SPARK_HOME Downloaded scala 2.11.8, unpacked to some directory and then set SCALA_HOME Set the…
uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64
5
votes
1 answer

Access spark-shell from different Spark versions

TL;DR: Is it absolutely necessary that the Spark running a spark-shell (driver) have the exactly same version of the Spark's master? I am using Spark 1.5.0 to connect to Spark 1.5.0-cdh5.5.0 via spark-shell: spark-shell --master…
4
votes
3 answers

Starting multiple workers on a master node in Standalone mode

I have a machine with 80 cores. I'd like to start a Spark server in standalone mode on this machine with 8 executors, each with 10 cores. But, when I try to start my second worker on the master, I get an error. $ ./sbin/start-master.sh Starting…
Ben Caine
  • 1,128
  • 3
  • 15
  • 25
4
votes
0 answers

spark-submit: unable to get driver status

I'm running a job on a test Spark standalone in cluster mode, but I'm finding myself unable to monitor the status of the driver. Here is a minimal example using spark-2.4.3 (master and one worker running on the same node, started running…
Cavaz
  • 2,996
  • 24
  • 38
4
votes
1 answer

pyspark got Py4JNetworkError("Answer from Java side is empty") when exit python

Background: spark standalone cluster mode on k8s spark 2.2.1 hadoop 2.7.6 run code in python, not in pyspark client mode, not cluster mode The pyspark code in python, not in pyspark env. Every code can work and get it down. But 'sometimes', when…
Jayce Li
  • 71
  • 1
  • 7
4
votes
0 answers

Spark launcher handle not updating state on Standalone cluster mode

I'm trying to programmatically submit Spark jobs using the Spark Launcher library in a spring web application. Everything works fine with yarn-client, yarn-cluster and standalone-client modes. However, when using standalone-cluster mode, the…
4
votes
2 answers

Why does stopping Standalone Spark master fail with "no org.apache.spark.deploy.master.Master to stop"?

Stopping standalone spark master fails with the following message: $ ./sbin/stop-master.sh no org.apache.spark.deploy.master.Master to stop Why? There is one Spark Standalone master up and running.
AravindR
  • 677
  • 4
  • 11
4
votes
2 answers

How many executor processes run for each worker node in spark?

How many executors will be launched for each worker node in Spark? Can i know the math behind it? for example i have 6 worker nodes and 1 master and if i submit a job through spark-submit, how many maximum number of executors will be launched for…
AKC
  • 953
  • 4
  • 17
  • 46
1
2
3
10 11