I work in an ETL development team where we use Spark-SQL to transform the data by creating and progressing with several intermediate temporary views in sequence and finally ending up with another temp view whose data is then copied into the target table folder.
However, at several instances our queries takes excessive amount of time even when dealing with small number of records (<~ 10K) and we scramble for help in all direction.
Hence I would like to know and learn about Spark SQL performance tuning in details (e.g. behind the scenes, architecture, and most importantly - interpreting Explain plans etc) which would help me to learn and create a solid foundation on the subject. I have experience in performance tuning with RDBMS (Teradata, Oracle etc) in the past.
Since, I am very new to this can anyone please point me in the right direction where I can find books, tutorials, courses etc on this subject. I have searched the internet and even several online learning platforms but couldn't find any comprehensive tutorial or resource to learn this.
Please help ! Thanks in advance..