Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags:

164 questions
0
votes
0 answers

How to configure thread count on Spark Driver node?

We are running spark streaming job in stand-alone cluster mode with deploy mode as the client. This streaming job polls messages from kafka topic periodically, and the logs generated at the driver node is flushed to a txt file. After running…
0
votes
0 answers

Unable to run Spark program on Standalone mode (Error in Client and Cluster mode)

I have a single Ubuntu server where I ran a Master and a Slave (one executor) and they show up on 8080 UI. I can run spark-shell --master spark://foo.bar:7077 successfully, but I can't submit my program (fat jar) successfully and I get errors…
0
votes
0 answers

Errors when running pyspark shell or from jupyter notebook

i'm trying to run a pyspark shell but when doing: (test3.8python) [test@JupyterHub ~]$ python3 /home/test/spark3.1.1/bin/pyspark i get the following error: File "/home/test/spark3.1.1/bin/pyspark", line 20 if [ -z "${SPARK_HOME}" ]; then …
nonoDa
  • 413
  • 2
  • 16
0
votes
1 answer

Spark errors with UDFs running on standalone

I'm running my program of spark which locally working but not remotly. My program have these components(containers): My application which is base on spring (for REST calls) that initiate a driver (Spark Session with getOrCreate) and has all the…
ChopChop
  • 13
  • 5
0
votes
1 answer

Prepend spark.jars to workers classpath

My use case is pretty simple, I want to override a few classes that are part of the Hadoop distribution, to do so I created a new jar that I serialize from the driver to the worker nodes using spark.jars properties. To make sure my new jar takes…
LiranBo
  • 2,054
  • 2
  • 23
  • 39
0
votes
1 answer

spark failed to launch org.apache.spark.deploy.worker.worker on master

I have setup Spark Standalone Cluster on two Ubuntu servers ( master and one slave). I had config /conf/spark-env.sh (after copy it from spark-env.sh.template) as follows: SPARK_MASTER_HOST="master" I started spark-master successfully on master by…
0
votes
1 answer

SparkLauncher Stand Alone Cluster Mode

Using Java API, I'm able to submit, getStatus & kill spark applications submitted via Spark Launcher in 'client' mode. Can Spark Launcher track & control applications submitted in Standalone 'cluster' mode?
0
votes
1 answer

Spark Standalone how to pass local .jar file to cluster

I have a cluster with two workers and one master. To start master & workers I use the sbin/start-master.sh and sbin/start-slaves.shin the master's machine. Then, the master UI shows me that the slaves are ALIVE (so, everything OK so far). Issue…
0
votes
1 answer

Worker nodes out of space due to spark streaming application executors have jar files

My spark streaming application is running in standalone mode, executors which have finished are still holding jar files. After a couple of days, it starts failing because worker Nodes are going out of space. How can we delete these completed…
0
votes
1 answer

What is the difference between --master local[n] and --total-executor-core = n (Spark Standalone)?

I have a Spark Standalone cluster with 4 nodes, each has 56 cores when I run my same job with --master local[56] and master --spark://... --executor-cores 56 --total-executor-cores 56 (which I think are the same) I find their performances are…
Litchy
  • 623
  • 7
  • 23
0
votes
0 answers

Why spark tasks seem like it run on sequence

Here's what i'm doing step by step : load file with 32 min partition do some operation (map,.., create dataset from rdd, sql on dataset) save result as parquet file My problem is when I check spark UI I find my job with 32 tasks 8 of them are…
M-BNCH
  • 393
  • 1
  • 3
  • 18
0
votes
2 answers

apache spark standalone scheduler - why does driver need a whole core in 'cluster' mode?

In spark's 'client' deploy mode the spark driver does not consume cores, only spark apps do. But why in 'cluster' mode does the spark driver need a core for itself?
tooptoop4
  • 234
  • 3
  • 15
  • 45
0
votes
2 answers

Logging using Logback on Spark StandAlone

We are using Spark StandAlone 2.3.2 and logback-core/logback-classic with 1.2.3 Have very simple Logback configuration file which allows us to log the data to a specific directory and on local I can pass the vm parameters from editor …
skjagini
  • 3,142
  • 5
  • 34
  • 63
0
votes
1 answer

SPARK 2.4 Standalone + Multiple Workers on a single multi-core server; Submissions are waiting on resources

On a reasonably equipped 64-bit Fedora (home) server with 12-Cores and 64gb-RAM, I have Spark 2.4 running in Standalone mode with the following configuration in ./spark-env.sh (where not shown are the items in that file that I have left commented…
NYCeyes
  • 5,215
  • 6
  • 57
  • 64
0
votes
1 answer

Spark structured streaming process each row on different worker nodes as soon as it arraives

Using spark 2.3 structred streaming and kafka as the input stream. My cluster is built from master and 3 workers. (master runs on one of the worker machines) My kafka topic has 3 partitions as the number of the workers. I am using the default…