Questions tagged [spark-ui]

the web interface of a running Spark application to monitor and inspect Spark job executions in a web browser

76 questions
3
votes
2 answers

Understanding Event Timeline of Spark UI

I've a job running which shows the Event Timeline as follows, I am trying to guess the gaps between these single lines, they seem to be parallel but not immediately sequencial with other stages... Any other insight from this, and what is the cluster…
Aakash Basu
  • 1,689
  • 7
  • 28
  • 57
2
votes
1 answer

How to find out when Spark Application has any memory/disk spills without checking Spark UI

My Environment: Databricks 10.4 Pyspark I'm looking into Spark performance and looking specifically into memory/disk spills that are available in Spark UI - Stage section. What I want to achieve is to get notified if my job had spills. I have…
BI Dude
  • 1,842
  • 5
  • 37
  • 67
2
votes
1 answer

Pyspark monitoring metrics not making sense

I am trying to understand the spark ui and hdfs ui while using pyspark. Following are my properties for the Session that I am running pyspark --master yarn --num-executors 4 --executor-memory 6G --executor-cores 3 --conf…
figs_and_nuts
  • 4,870
  • 2
  • 31
  • 56
2
votes
2 answers

How to increase Jetty's header buffer size in the Spark UI reverse proxy

I'm getting "HTTP ERROR 502 Bad Gateway" when I click on a worker link in my standalone Spark UI. Looking at the master logs I can see a corresponding message... HttpSenderOverHTTP.java:219 Generated headers (4096 bytes), chunk (-1 bytes), content…
Martin Stone
  • 12,682
  • 2
  • 39
  • 53
2
votes
1 answer

Why do I see two jobs in Spark UI for a single read?

I am trying to run the below script to load file with 24k records. Is there any reason why I am seeing two jobs for single load in Spark UI. code from pyspark.sql import SparkSession spark = SparkSession\ .builder\ .appName("DM")\ …
user16344431
2
votes
1 answer

How can I get DAG of Spark Sql Query execution plan?

I am doing some analysis on spark sql query execution plans. the execution plans that explain() api prints are not much readable. If we see spark web UI, a DAG graph is created which is divided into jobs, stages and tasks and much more readable. Is…
2
votes
0 answers

Why executor memory used is shown greater than total available memory on spark web UI?

I have a spark structured streaming job that is running for around last 3 weeks. When I open the Executors tab on spark web UI, it shows memory used - 36.1GB total available memory for storage - 3.2GB For this application executor memory is set…
2
votes
2 answers

Refusing to display LOCALHOST in a frame because 'X-Frame-Options' set to 'sameorigin'

This question specifically regards the localhost. I am trying to embed a localhost web page in another localhost web page however it states that this cannot be done. This was the message in chrome developer tools: Refused to display…
2
votes
1 answer

Spark UI -> SQL tab doesn't show all (older) stages

I am executing a spark (sql) job which has lots of stages (~150). It is written using spark-sql primarily within an internal framework that chains the SQL's using temporary views and dataframes. For initial intermediate table writes, I can see a…
sujit
  • 2,258
  • 1
  • 15
  • 24
2
votes
2 answers

What is 'Active Jobs' in Spark History Server Spark UI Jobs section

I'm trying to understand Spark History server components. I know that, History server shows completed Spark applications. Nonetheless, I see 'Active Jobs' set to 1 for a completed Spark application. I'm trying to understand what is 'Active Jobs'…
Ash
  • 1,180
  • 3
  • 22
  • 36
2
votes
1 answer

Spark local mode: How to query the number of executor slots?

I'm following tutorial Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data where it's claimed that the "local mode" Spark cluster available in Databricks "Community Edition" provides you with 3 executor slots. (So 3 Tasks should…
das-g
  • 9,718
  • 4
  • 38
  • 80
1
vote
1 answer

Pyspark get max value of column of a csv the quickest way possible

I am trying to get the max value of a column using this: df.agg(max(col('some_integer_column')),min(col('some_integer_column'))) The df is a csv file. Which I know if it was a parquet/delta it would be much easier and faster. As the csv file needs…
1
vote
0 answers

WholeStageCodegen’s min duration larger than the query duration

I found in the spark ui, the duration of the min time of the WholeStageCodegen part larger than the duration of the query. I think that does not make sense right? Now, I want to examine the where does those total, min, max were calculated in the…
1
vote
0 answers

Spark Yarn Error during closing SparkContext

My Spark application works in a Yarn Hadoop cluster. After completing its tasks and attempting to close the SparkContext, my application encounters an error: 2023-06-05 12:30:43,361 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode:…
1
vote
0 answers

I have a code running in a GCP cluster and I am trying to connect it to the spark UI but it says it cannot connect to port 8080?

bind [::1]:8080: Cannot assign requested address Linux data-eng-m 5.10.0-0.deb10.16-amd64 #1 SMP Debian 5.10.127-2~bpo10+1 (2022-07-28) x86_64 This is the error that I keep getting. I created my application in my notebook running in the cluster but…