Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags: apache-spark

164 questions

vote

1 answer

Spark web UI unreachable

i have installed spark2.0.0 on 12 nodes (in cluster standalone mode), when i launch it i get this : ./sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to…

ubuntu apache-spark ssh cluster-computing apache-spark-standalone

asked Aug 30 '16 at 11:40

hammad

vote

2 answers

How to run spark distributed in cluster mode, but take file locally?

Is it possible to have spark take a local file as input, but process it distributed? I have sc.textFile(file:///path-to-file-locally) in my code, and I know that the exact path to the file is correct. Yet, I am still getting Py4JJavaError: An…

apache-spark cluster-computing pyspark apache-spark-standalone

asked Jul 05 '16 at 18:57

buzzinolops

vote

1 answer

Not all nodes used in Spark Standalone cluster

I have made a Spark Standalone Cluster with two virtual machines. In the 1st VM (8 cores, 64 GB Memory), I started the master manually using the command bin/spark-class org.apache.spark.deploy.master.Master. In the 2nd VM (8 cores, 64 GB Memory), I…

apache-spark apache-spark-standalone

asked Jun 13 '16 at 06:31

Abhilash Awasthi

vote

0 answers

Why are Spark executors trying to connect to spark_master instead of SPARK_MASTER_IP?

Using a Spark 1.6.1 standalone cluster. After a system restart (and only minor config changes to /etc/hosts per worker) Spark executors suddenly started throwing errors that they couldn't connect to spark_master. When I echo $SPARK_MASTER_IP on the…

apache-spark apache-spark-standalone

asked May 31 '16 at 21:52

crockpotveggies

12,682
12
70
140

vote

0 answers

Spark Streaming - StandAlone Mode (cleanup not deleting data in work folder for each app for each batch)

In Streaming have set these parameters as below spark.worker.cleanup.enabled true spark.worker.cleanup.interval 60 spark.worker.cleanup.appDataTtl 90 This clears out already killed spark batch/streaming jobs data in…

apache-spark spark-streaming apache-spark-standalone

asked Mar 22 '16 at 09:17

Santosh B

votes

0 answers

Is it Possible to Choose Spark Executor Location

It's known for Spark & Kafka integration, we have some options for executor location as described in the link: LocationStrategies Is there any option like this for Storage Layer. For example let's assume I will integrate Spark with Minio as the…

apache-spark kubernetes apache-spark-standalone

asked Aug 08 '23 at 18:18

sem

votes

0 answers

Spark executors exiting frequently and Initial job has not accepted any resources

I have a remote Standalone Spark cluster running in 2 Docker containers, spark-master and spark-worker. I am trying to test a simple Python program to test connectivity to Spark, but I always get the following error: WARN TaskSchedulerImpl: Initial…

docker apache-spark pyspark apache-spark-standalone

asked Aug 07 '23 at 12:03

MagentaPink

votes

0 answers

Connecting to Spark Standalone cluster from Airflow

I've airflow running on local env using docker-compose file and spark standalone cluster also running on local. I logged into airflow worker container and tried to submit the spark job to standalone spark cluster but connection to master node is…

docker apache-spark airflow spark-submit apache-spark-standalone

asked Jul 24 '23 at 13:04

Yash Agrawal

votes

0 answers

Define specific spark executors per worker on a Spark cluster of 3 worker nodes

I have a Spark cluster of 3 servers (1 worker per server = 3 workers). The resources are very much the same across servers (70 cores, 386GB of RAM each). I also have an application that I spark-submit, with 120 cores and 200GB ram (24…

apache-spark hadoop-yarn executor apache-spark-standalone cluster-manager

asked May 19 '23 at 13:10

Andreas Lampropoulos

votes

0 answers

sparkluancher Submit task error command line is too long

I'm submitting the sparkStandalone task using the sparklauncherAPI to submit the task to the spark cluster in the local idea area but I'm having a problem reporting that the command line is too long to submit the task to the sparkStandalone…

apache-spark apache-spark-standalone spark-launcher

asked Jan 04 '23 at 08:35

Orack

votes

0 answers

How to submit pyspark jobs to Spark Standalone cluster from Airflow in docker

As per the official spark documentation, we can't run pyspark application in cluster mode inside a standalone cluster. Currently, the standalone mode does not support cluster mode for Python applications. Then how can we submit a pyspark job to…

pyspark airflow apache-spark-standalone

asked Nov 28 '22 at 03:39

Anitta Therattil

votes

0 answers

Error on starting worker nodes in spark standalone cluster

I am trying to setup a spark standalone cluster with 3 nodes. Configurations for Linux servers are below: master node with 2 core and 25GB memory worker node 1 with 4 core and 21GB memory worker node 2 with 8 core and 19GB memory I have started the…

apache-spark pyspark apache-spark-standalone

asked Nov 25 '22 at 10:05

shee8

votes

0 answers

Spark Tasks stop by unknown cause

I processed to compress(bzip) csv file data(400GB or 1.2TB) and write to Postgre in spark standalone cluster. However, when Spark writes data to Postgresql through JDBC Driver, Spark job's tasks stopped. I am not sure what task stopped. enter image…

python apache-spark pyspark apache-spark-standalone

asked Nov 07 '22 at 06:46

jasonryu

votes

0 answers

Spark driver / executors in docker containers with port translation

I'm trying to setup a spark standalone cluster on a bunch of docker containers in a private cloud. The executor processes, running in nodes different from the driver's node, are not able to connect back to the driver because the host port that is…

docker apache-spark cloud apache-spark-standalone

asked Sep 10 '22 at 12:33

francotirador

votes

0 answers

how to fix error in lunching pyspark standalone mode

im new to pyspark and i tried to lunch pyspark standalone cluster . i lunched the master using : bin\spark-class2.cmd org.apache.spark.deploy.master.Master i lunched the worker using : bin\spark-class2.cmd org.apache.spark.deploy.worker.Worker -c 2…

scala apache-spark pyspark apache-spark-standalone

asked May 25 '22 at 09:07

ab cosmoweb

Prev 1 2 3

…

10 11 Next