I've got a complex software which performs really complex SQL queries (well not queries, Spark plans you know). <-- The plans are dynamic, they change based on user input so I can't "cache" them.
I've got a phase in which spark takes 1.5-2min building the plan. Just to make sure, I added "logXXX", then explain(true), then "logYYY" and it takes 1minute 20 seconds for the explain to execute.
I've trying breaking the lineage but this seems to cause worse performance because the actual execution time becomes longer.
I can't parallelize driver work (already did, but this task can't be overlapped with anything else).
Any ideas/guide on how to improve the plan builder in Spark? (like for example, flags to try enabling/disabling and such...)
Is there a way to cache plans in Spark? (so I can run that in parallel and then execute it)
I've tried disabling all possible optimizer rules, setting min iterations to 30... but nothing seems to affect that concrete point :S
I tried disabling wholeStageCodegen and it helped a little, but the execution is longer so :).
Thanks!,
PS: The plan does contain multiple unions (<20, but quite complex plans inside each union) which are the cause for the time, but splitting them apart also affects execution time.