Invisible Delays between Spark Jobs

Question

There are 4 major actions(jdbc write) with respect to application and few counts which in total takes around 4-5 minutes for completion. But the total uptime of Application is around 12-13minutes.

I see there are certain jobs by name run at ThreadPoolExecutor.java : 1149. Just before this job being reflected on Spark UI, the invisible long delays occur.

I want to know what are the possible causes for these delays. My application is reading 8-10 CSV files, 5-6 VIEWs from table. Number of joins are around 59, few groupBy with agg(sum) are there and 3 unions are there.

I am not able to reproduce the issue in DEV/UAT env since the data is not that much. It's in the production where I get the app. executed run by my Manager.

If anyone has come across such delays in their job, please share your experience what could be the potential cause for this, currently I am working around the unions, i.e. caching the associated dataframes and calling count so as to get the benefit of cache in the coming union(yet to test, if union is the reason for delays)

Similarly, I tried the break the long chain of transformations with cache and count in between to break the long lineage. The time reduced from initial 18 minutes to 12 minutes but the issue with invisible delays still persist.

Thanks in advance

If you don't have any CPU or IO-intensive code between your spark job call. Then it's time for Spark Query planning. — Grigoriev Nick, Jul 20 '21 at 12:02

Grigoriev Nick · Answer 1 · 2021-07-20T15:07:46.450

0

I assume you don't have a CPU or IO heavy code between your spark jobs. So it really sparks, 99% it is QueryPlaning delay. You can use spark.listenerManager.register(QueryExecutionListener) to check different metrics of query planing performance.

edited Jul 20 '21 at 15:07

answered Jul 20 '21 at 12:32

Grigoriev Nick

1,099
8
24

Could you elaborate more on how to use QueryExecutionListener check the delays. – user8918714 Jul 22 '21 at 14:18
Some of his methods return QueryExecution one of the field of this object is tracker: QueryPlanningTracker. It has a lot of metrics about query planning. – Grigoriev Nick Jul 22 '21 at 14:34

Invisible Delays between Spark Jobs

1 Answers1