Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags:

164 questions
1
vote
1 answer

Spark web UI unreachable

i have installed spark2.0.0 on 12 nodes (in cluster standalone mode), when i launch it i get this : ./sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to…
1
vote
2 answers

How to run spark distributed in cluster mode, but take file locally?

Is it possible to have spark take a local file as input, but process it distributed? I have sc.textFile(file:///path-to-file-locally) in my code, and I know that the exact path to the file is correct. Yet, I am still getting Py4JJavaError: An…
1
vote
1 answer

Not all nodes used in Spark Standalone cluster

I have made a Spark Standalone Cluster with two virtual machines. In the 1st VM (8 cores, 64 GB Memory), I started the master manually using the command bin/spark-class org.apache.spark.deploy.master.Master. In the 2nd VM (8 cores, 64 GB Memory), I…
Abhilash Awasthi
  • 782
  • 5
  • 22
1
vote
0 answers

Why are Spark executors trying to connect to spark_master instead of SPARK_MASTER_IP?

Using a Spark 1.6.1 standalone cluster. After a system restart (and only minor config changes to /etc/hosts per worker) Spark executors suddenly started throwing errors that they couldn't connect to spark_master. When I echo $SPARK_MASTER_IP on the…
crockpotveggies
  • 12,682
  • 12
  • 70
  • 140
1
vote
0 answers

Spark Streaming - StandAlone Mode (cleanup not deleting data in work folder for each app for each batch)

In Streaming have set these parameters as below spark.worker.cleanup.enabled true spark.worker.cleanup.interval 60 spark.worker.cleanup.appDataTtl 90 This clears out already killed spark batch/streaming jobs data in…
0
votes
0 answers

Is it Possible to Choose Spark Executor Location

It's known for Spark & Kafka integration, we have some options for executor location as described in the link: LocationStrategies Is there any option like this for Storage Layer. For example let's assume I will integrate Spark with Minio as the…
sem
  • 61
  • 1
  • 9
0
votes
0 answers

Spark executors exiting frequently and Initial job has not accepted any resources

I have a remote Standalone Spark cluster running in 2 Docker containers, spark-master and spark-worker. I am trying to test a simple Python program to test connectivity to Spark, but I always get the following error: WARN TaskSchedulerImpl: Initial…
0
votes
0 answers

Connecting to Spark Standalone cluster from Airflow

I've airflow running on local env using docker-compose file and spark standalone cluster also running on local. I logged into airflow worker container and tried to submit the spark job to standalone spark cluster but connection to master node is…
0
votes
0 answers

Define specific spark executors per worker on a Spark cluster of 3 worker nodes

I have a Spark cluster of 3 servers (1 worker per server = 3 workers). The resources are very much the same across servers (70 cores, 386GB of RAM each). I also have an application that I spark-submit, with 120 cores and 200GB ram (24…
0
votes
0 answers

sparkluancher Submit task error command line is too long

I'm submitting the sparkStandalone task using the sparklauncherAPI to submit the task to the spark cluster in the local idea area but I'm having a problem reporting that the command line is too long to submit the task to the sparkStandalone…
0
votes
0 answers

How to submit pyspark jobs to Spark Standalone cluster from Airflow in docker

As per the official spark documentation, we can't run pyspark application in cluster mode inside a standalone cluster. Currently, the standalone mode does not support cluster mode for Python applications. Then how can we submit a pyspark job to…
0
votes
0 answers

Error on starting worker nodes in spark standalone cluster

I am trying to setup a spark standalone cluster with 3 nodes. Configurations for Linux servers are below: master node with 2 core and 25GB memory worker node 1 with 4 core and 21GB memory worker node 2 with 8 core and 19GB memory I have started the…
shee8
  • 139
  • 10
0
votes
0 answers

Spark Tasks stop by unknown cause

I processed to compress(bzip) csv file data(400GB or 1.2TB) and write to Postgre in spark standalone cluster. However, when Spark writes data to Postgresql through JDBC Driver, Spark job's tasks stopped. I am not sure what task stopped. enter image…
0
votes
0 answers

Spark driver / executors in docker containers with port translation

I'm trying to setup a spark standalone cluster on a bunch of docker containers in a private cloud. The executor processes, running in nodes different from the driver's node, are not able to connect back to the driver because the host port that is…
0
votes
0 answers

how to fix error in lunching pyspark standalone mode

im new to pyspark and i tried to lunch pyspark standalone cluster . i lunched the master using : bin\spark-class2.cmd org.apache.spark.deploy.master.Master i lunched the worker using : bin\spark-class2.cmd org.apache.spark.deploy.worker.Worker -c 2…