I'm trying to optimize one program with Spark SQL, this program is basically a HUGE SQL query (joins like 10 tables with many cases etc etc). I'm more used to more DF-API-oriented programs, and those did show the different stages much better.
It's quite well structured and I understand it more or less. However I have a problem, I always use Spark UI SQL view to get hints on where to focus the optimizations.
However in this kind of program Spark UI SQL shows nothing, is there a reason for this? (or a way to force it to show).
I'm expecting to see each join/scan with the number of output rows after it and such.... but I only see a full "WholeStageCodeGen" for a "Parsed logical plan" which is like 800lines
I can't show code, it has the following "points":
1- Action triggering it, its "show"(20)
3- Takes like 1 hour of execution (few executors yet)
2- has a persist before the show/action.
3- Uses Kudu, Hive and In-memory tables (registered before this query)
4- Has like 700 lines logical plan
Is there a way to improve the tracing there? (maybe disabling WholeStageCodegen?, but that may hurt performance...)
Thanks!