2

I am executing a spark (sql) job which has lots of stages (~150). It is written using spark-sql primarily within an internal framework that chains the SQL's using temporary views and dataframes. For initial intermediate table writes, I can see a detailed view in Spark UI -> SQL tab. But for the later table writes, the SQL tab just shows a UI of below form.

What is the reason for this and can I use any parameter to get a detailed graphical view in the SQL tab?

My spark version: 2.3

EDIT:17 Jan 2020 I found a JIRA https://issues.apache.org/jira/browse/SPARK-30064, but I am not sure if it's related since that is mentioning jdbc datasource which I am not using.

enter image description here

sujit
  • 2,258
  • 1
  • 15
  • 24

1 Answers1

0

Check out https://spark.apache.org/docs/2.3.4/configuration.html#spark-ui specifically I suspect for this issue you may have spark.ui.retainedStages (default 1000) and/or spark.ui.retainedTasks (default 100k) set too low.

If your job has 150 stages, and for example, each stage has 1000 tasks on average, then your whole job would have 150*1000 = 150k tasks, which is over default 100k limit. So you would not see in Spark UI those older tasks / stages etc.

PS. Also for Spark with such large number of stages (e.g. when you have a lot of dataframes etc chained in iteratively), we often find that creating checkpoints helps a lot. E.g. you could checkpoint for example every 20-50 iterations (if there is a loop that creates that huge lineage; play with the number that works best for your case), so you essentially split up that huge job with 150 stages into chunks of 20-50 stages. Spark Optimizer may have hard time going through a DAG of 150 dataframes to create an optimal plan etc.

https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-checkpointing.html

Tagar
  • 13,911
  • 6
  • 95
  • 110
  • No luck. I tried with larger value of above params and also `spark.sql.ui.retainedExecutions`, but observation doesn't change. Perhaps it has to do with the disclaimer on the doc page:: This is a target maximum, and __fewer elements may be retained in some circumstances__. – sujit Dec 28 '19 at 06:19
  • what did you try? how far up did you tune those up? – Tagar Dec 30 '19 at 19:43
  • All defaults * 1000 – sujit Jan 06 '20 at 05:14