Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags: apache-spark

164 questions

votes

0 answers

How to configure thread count on Spark Driver node?

We are running spark streaming job in stand-alone cluster mode with deploy mode as the client. This streaming job polls messages from kafka topic periodically, and the logs generated at the driver node is flushed to a txt file. After running…

asked Sep 18 '21 at 16:32

Anoop Deshpande

votes

0 answers

Unable to run Spark program on Standalone mode (Error in Client and Cluster mode)

I have a single Ubuntu server where I ran a Master and a Slave (one executor) and they show up on 8080 UI. I can run spark-shell --master spark://foo.bar:7077 successfully, but I can't submit my program (fat jar) successfully and I get errors…

scala apache-spark ubuntu-server apache-spark-standalone

asked Jul 17 '21 at 18:10

MalekMFS

votes

0 answers

Errors when running pyspark shell or from jupyter notebook

i'm trying to run a pyspark shell but when doing: (test3.8python) [test@JupyterHub ~]$ python3 /home/test/spark3.1.1/bin/pyspark i get the following error: File "/home/test/spark3.1.1/bin/pyspark", line 20 if [ -z "${SPARK_HOME}" ]; then …

python-3.x apache-spark pyspark apache-spark-standalone

asked Mar 10 '21 at 16:02

nonoDa

votes

1 answer

Spark errors with UDFs running on standalone

I'm running my program of spark which locally working but not remotly. My program have these components(containers): My application which is base on spring (for REST calls) that initiate a driver (Spark Session with getOrCreate) and has all the…

scala apache-spark apache-spark-standalone

asked Jan 11 '21 at 12:00

ChopChop

votes

1 answer

Prepend spark.jars to workers classpath

My use case is pretty simple, I want to override a few classes that are part of the Hadoop distribution, to do so I created a new jar that I serialize from the driver to the worker nodes using spark.jars properties. To make sure my new jar takes…

apache-spark classpath apache-spark-standalone

asked Aug 10 '20 at 11:57

LiranBo

2,054
2
23
39

votes

1 answer

spark failed to launch org.apache.spark.deploy.worker.worker on master

I have setup Spark Standalone Cluster on two Ubuntu servers ( master and one slave). I had config /conf/spark-env.sh (after copy it from spark-env.sh.template) as follows: SPARK_MASTER_HOST="master" I started spark-master successfully on master by…

apache-spark-standalone

asked Jul 12 '20 at 14:13

Simin Ghasemi

votes

1 answer

SparkLauncher Stand Alone Cluster Mode

Using Java API, I'm able to submit, getStatus & kill spark applications submitted via Spark Launcher in 'client' mode. Can Spark Launcher track & control applications submitted in Standalone 'cluster' mode?

apache-spark apache-spark-standalone spark-launcher

asked Jun 29 '20 at 17:06

ThinkTank0790

votes

1 answer

Spark Standalone how to pass local .jar file to cluster

I have a cluster with two workers and one master. To start master & workers I use the sbin/start-master.sh and sbin/start-slaves.shin the master's machine. Then, the master UI shows me that the slaves are ALIVE (so, everything OK so far). Issue…

apache-spark cluster-computing apache-spark-standalone

asked Mar 13 '20 at 10:37

meisan

votes

1 answer

Worker nodes out of space due to spark streaming application executors have jar files

My spark streaming application is running in standalone mode, executors which have finished are still holding jar files. After a couple of days, it starts failing because worker Nodes are going out of space. How can we delete these completed…

apache-spark spark-streaming apache-spark-standalone

asked Jan 24 '20 at 03:59

Shyam Reddy

votes

1 answer

What is the difference between --master local[n] and --total-executor-core = n (Spark Standalone)?

I have a Spark Standalone cluster with 4 nodes, each has 56 cores when I run my same job with --master local[56] and master --spark://... --executor-cores 56 --total-executor-cores 56 (which I think are the same) I find their performances are…

apache-spark apache-spark-standalone

asked Oct 21 '19 at 09:56

Litchy

votes

0 answers

Why spark tasks seem like it run on sequence

Here's what i'm doing step by step : load file with 32 min partition do some operation (map,.., create dataset from rdd, sql on dataset) save result as parquet file My problem is when I check spark UI I find my job with 32 tasks 8 of them are…

apache-spark apache-spark-standalone

asked Aug 05 '19 at 15:09

M-BNCH

votes

2 answers

apache spark standalone scheduler - why does driver need a whole core in 'cluster' mode?

In spark's 'client' deploy mode the spark driver does not consume cores, only spark apps do. But why in 'cluster' mode does the spark driver need a core for itself?

apache-spark hadoop pyspark apache-spark-standalone

asked Jun 26 '19 at 23:00

tooptoop4

votes

2 answers

Logging using Logback on Spark StandAlone

We are using Spark StandAlone 2.3.2 and logback-core/logback-classic with 1.2.3 Have very simple Logback configuration file which allows us to log the data to a specific directory and on local I can pass the vm parameters from editor …

java scala apache-spark logback apache-spark-standalone

asked May 09 '19 at 19:02

skjagini

3,142
5
34
63

votes

1 answer

SPARK 2.4 Standalone + Multiple Workers on a single multi-core server; Submissions are waiting on resources

On a reasonably equipped 64-bit Fedora (home) server with 12-Cores and 64gb-RAM, I have Spark 2.4 running in Standalone mode with the following configuration in ./spark-env.sh (where not shown are the items in that file that I have left commented…

apache-spark pyspark jupyter-notebook apache-spark-standalone

asked Dec 21 '18 at 07:29

NYCeyes

5,215
6
57
64

votes

1 answer

Spark structured streaming process each row on different worker nodes as soon as it arraives

Using spark 2.3 structred streaming and kafka as the input stream. My cluster is built from master and 3 workers. (master runs on one of the worker machines) My kafka topic has 3 partitions as the number of the workers. I am using the default…

apache-spark apache-kafka apache-spark-sql spark-structured-streaming apache-spark-standalone

asked Aug 15 '18 at 12:57

D. bachar

Prev 1 2 3

…

10 11 Next