Questions tagged [spark-ui]

the web interface of a running Spark application to monitor and inspect Spark job executions in a web browser

76 questions
1
vote
0 answers

Spark UI SQL view shows almost nothing

I'm trying to optimize one program with Spark SQL, this program is basically a HUGE SQL query (joins like 10 tables with many cases etc etc). I'm more used to more DF-API-oriented programs, and those did show the different stages much better. It's…
BiS
  • 501
  • 4
  • 17
1
vote
0 answers

How to know what kind of work each Spark task/executor runs

When my application runs on a Spark cluster, I know the following 1) the execution plan 2) the DAG with nodes as RDD or operations 3) all jobs/stages/executors/tasks However, I do not find how to know given a task ID what kinds of work (RDD or…
Joe C
  • 2,757
  • 2
  • 26
  • 46
1
vote
1 answer

Understanding Spark UI for a streaming application

I am trying to understand what the entries in my Spark UI signify. Calling an action results in creation of a job. I am finding hard to understand How many of these jobs get created? and is that proportional to the number of micro-batches? What…
fledgling
  • 991
  • 4
  • 25
  • 48
0
votes
0 answers

SparkSql Why I am seeing total 5 spark jobs for simple sql query

As per my understanding, there will be one job for each action in Spark. But often I see there are more than one jobs triggered for a single action. I was trying to test this by doing a simple aggregation on a dataset to get distinct values for dep…
Vinit89
  • 583
  • 1
  • 7
  • 31
0
votes
1 answer

Spark number of tasks not equal to number of partitions

I have read that the number of partitions are related to the number of tasks. When I read a query plan on any job that is not the file reading job (for instance, the merge job of a join) I do see that it gets as many tasks as number of partitions of…
0
votes
0 answers

Spark UI: Executors tab it's empty

I'm running the Spark UI on my web server, by exporting the following SPARK_HISTORY_OPTS: export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=${LOCAL_PATH_LOGS} -Dspark.history.ui.port=18080 -Dspark.eventLog.enabled=true…
br1
  • 357
  • 1
  • 5
  • 19
0
votes
0 answers

Definition of Peak JVM Memory OnHeap on Spark UI

I'm under the impression that if I set spark.executor.memory to 50G, then with this formula and reading this article The maximum heap will be 30.3G. But the peak JVM onHeap I'm seeing in Spark UI is 39.3 GiB Can I get some help to explain what is…
Cen
  • 19
  • 7
0
votes
0 answers

Spark application UI becomes unavailable

Apache Spark is deployed via bitnami helm chart on Kubernetes. The reverse Proxy for the UI is enabled and working fine (except for some dead links / rewrites). After running fine for a while, the application detail UI becomes unavailable and shows…
0
votes
0 answers

Spark standalone HA mode, how to proxy all spark ui request to leader node

Env: spark 3.3.1 nginx 1.18.0 I have two spark master node worked in HA mode, one is leaeer, another is standby. And I want use spark ui to view information about spark workers and spark drivers. But the factor is , if I visit a spark ui on…
刘思凡
  • 423
  • 2
  • 14
0
votes
0 answers

Understanding Spark UI details for a small pyspark snippet

I am trying to understand the job, stage and task details shown in SparkUI for the following pyspark snippet # Disabling adaptive execution to see 200 shuffle partitions show up spark.conf.set('spark.sql.adaptive.enabled', 'false') df =…
user3138594
  • 209
  • 3
  • 9
0
votes
1 answer

Docker build is failing

I am trying to upload ca-certificate stored in my windows location to docker container but while running the build image it is failing. I am not getting how to copy the certificate from windows location to linux recommended location. FROM…
pbh
  • 186
  • 1
  • 9
0
votes
0 answers

Problems with setting up PySpark packages onto MacBook

I am currently going through LinkedIn Learning's Data Engineering Foundations course and I cannot run the files. One such file is: ` ##import required libraries import pyspark ##create spark session spark = pyspark.sql.SparkSession \ .builder…
Festo
  • 1
  • 1
0
votes
1 answer

Can't access spark UI even if I use the ip address reported in the console

I cannot access the web UI of my containerized spark cluster even if I copy and paste the following Ip address: Stopped Spark web UI at http://987c8b219d32:4040 The cluster that I've built is taken from this tutorial Spark Cluster Tutorial
0
votes
0 answers

Dataproc PHS Yarn RM UI not able to read logs from remote-app-log-dir

I am working on setting up a dataproc PHS for my Spark and Hive applications. I was successfully able to set up the Spark History Server in a standalone dataproc cluster (PHS) by setting up the following…
0
votes
0 answers

Increasing spark workers and cassandra nodes takes more time

So, I have a small cluster with 3 Spark workers(2 executors each) and on the same nodes I have also installed Cassandra in order to achieve data locality. In order to evaluate the speed and times(from SparkUI) I run the same code with, firstly one…