1

There are 4 major actions(jdbc write) with respect to application and few counts which in total takes around 4-5 minutes for completion. But the total uptime of Application is around 12-13minutes.

I see there are certain jobs by name run at ThreadPoolExecutor.java : 1149. Just before this job being reflected on Spark UI, the invisible long delays occur.

I want to know what are the possible causes for these delays. My application is reading 8-10 CSV files, 5-6 VIEWs from table. Number of joins are around 59, few groupBy with agg(sum) are there and 3 unions are there.

I am not able to reproduce the issue in DEV/UAT env since the data is not that much. It's in the production where I get the app. executed run by my Manager.

If anyone has come across such delays in their job, please share your experience what could be the potential cause for this, currently I am working around the unions, i.e. caching the associated dataframes and calling count so as to get the benefit of cache in the coming union(yet to test, if union is the reason for delays)

Similarly, I tried the break the long chain of transformations with cache and count in between to break the long lineage. The time reduced from initial 18 minutes to 12 minutes but the issue with invisible delays still persist.

Thanks in advance

1 Answers1

0

I assume you don't have a CPU or IO heavy code between your spark jobs. So it really sparks, 99% it is QueryPlaning delay. You can use spark.listenerManager.register(QueryExecutionListener) to check different metrics of query planing performance.

Grigoriev Nick
  • 1,099
  • 8
  • 24