Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags:

164 questions
4
votes
0 answers

Apache Spark: History server (logging) + non super-user access (HDFS)

I have a working HDFS and a running Spark framework in a remote server. I am running SparkR applications and hope to see the logs of the completed UI as well. I followed all the instructions here: Windows: Apache Spark History Server Config and…
3
votes
0 answers

How to get result from Spark after submitting a job via REST API?

When I submit a Spark job through API /v1/submissions/create on port 6066 and check the status of it by /v1/submissions/status/{driver-id}, I only get something like this { "action" : "SubmissionStatusResponse", "driverState" : "FINISHED", …
MegaOwIer
  • 98
  • 1
  • 7
3
votes
0 answers

Standalone Spark - How to find final status (Driver's) for an application

I am setting up Spark 2.2.0 in standalone mode (https://spark.apache.org/docs/latest/spark-standalone.html) and submitting spark jobs programatically using SparkLauncher sparkAppLauncher = new…
3
votes
1 answer

Apache Spark method not found sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;

I encounter this problem while running an automated data processing script in spark-shell. First couple of iterations work fine, but it always sooner or later bumps into this error. I googled this issue but haven't found an exact match. Other…
3
votes
0 answers

Connecting to remote Spark Cluster

I'm having problem connecting to a spark cluster remotely from jupyter notebook. It works fine locally. Method 1: conf = pyspark.SparkConf().setAppName('Pi').setMaster('spark://my-cluster:7077') sc = pyspark.SparkContext(conf=conf) This returns…
beginner_
  • 7,230
  • 18
  • 70
  • 127
3
votes
0 answers

Buffer/cache exhaustion Spark standalone inside a Docker container

I have a very weird memory issue (which is what a lot of people will most likely say ;-)) with Spark running in standalone mode inside a Docker container. Our setup is as follows: We have a Docker container in which we have a Spring boot…
3
votes
3 answers

Why Spark utilizing only one core per executor? How it decides to utilize cores other than number of partitions?

I am running spark in HPC environment on slurm using Spark standalone mode spark version 1.6.1. The problem is my slurm node is not fully used in the spark standalone mode. I am using spark-submit in my slurm script. There are 16 cores available on…
Laeeq
  • 357
  • 1
  • 4
  • 15
3
votes
4 answers

Spark master won't show running application in UI when I use spark-submit for python script

The image shows 8081 UI. The master shows running application when I start a scala shell or pyspark shell. But when I use spark-submit to run a python script, master doesn't show any running application. This is the command I used: spark-submit…
kavya
  • 759
  • 4
  • 14
  • 31
3
votes
1 answer

Forcing driver to run on specific slave in spark standalone cluster running with "--deploy-mode cluster"

I am running a small spark cluster, with two EC2 instances (m4.xlarge). So far I have been running the spark master on one node, and a single spark slave (4 cores, 16g memory) on the other, then deploying my spark (streaming) app in client…
Adam Dossa
  • 228
  • 1
  • 8
3
votes
1 answer

Is FAIR available for Spark Standalone cluster mode?

I'm having 2 node cluster with spark standalone cluster manager. I'm triggering more than one job using same sc with Scala multi threading.What I found is my jobs are scheduled one after another because of FIFO nature so I tried to use FAIR…
Balaji Reddy
  • 5,576
  • 3
  • 36
  • 47
3
votes
1 answer

java.lang.IllegalStateException: Cannot find any build directories

I want to run spark master and worker in Intellij. I have started the spark master and worker successfully. The worker is also connected to master without any problem. I can confirm this by looking at logs and spark web UI. But the problem starts…
3
votes
3 answers

Continuously INFO JobScheduler:59 - Added jobs for time *** ms in my Spark Standalone Cluster

We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration. Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears…
3
votes
1 answer

Role of the Executors on the Spark master machine

In a Spark stand alone cluster, does the Master node run tasks as well? I wasn't sure if there Executors processes are spun up on the Master node and do work, alongside the Worker nodes. Thanks!
Ranjit Iyer
  • 857
  • 1
  • 11
  • 20
2
votes
2 answers

Is writing to database done by driver or executor in spark cluster

I have a spark cluster setup with 1 master node and 2 worker nodes. I am running a pyspark application in this spark standalone cluster where I have a job to write the transformed data into Mysql database. So, I have a question here whether writing…
2
votes
2 answers

Hadoop 3 gcs-connector doesn't work properly with latest version of spark 3 standalone mode

I wrote a simple Scala application which reads a parquet file from GCS bucket. The application uses : JDK 17 Scala 2.12.17 Spark SQL 3.3.1 gcs-connector of hadoop3-2.2.7 The connector is taken from Maven, imported via sbt (Scala build tool). I'm…
1 2
3
10 11