job execution in spark

Question

I am very much new to spark and was trying to check the DAG creation in spark UI using :

when I am reading the simple csv file using the command

val df = spark.read.format("csv").option("header", "true").load("/home/user/test.csv") then in spark only 1 STAGE is creating with DAG as :

and I am not understanding what and why it is "MAP" > "MAPPARTITIONSINTERNAL" >"WHOLESTAGECODEGEN"

and when I am running the command including the "inferschema" options to TRUE then 2 STAGES are creating :

spark.read.format("csv").option("header", "true").option("inferSchema", true).load("/home/user/test.csv") with each stage has theirown DAG

STAGE 1 DAG :

STAGE 2 DAG:

Can any body please help me ,why it is creating two stages when inferschema is TRUE and from where I can get the terms elaboration as mentioned in stages like "DESERIALIZETOOBJECT">"MAP" etc.

Waiting for the valuable inputs which can help to understand the DAG in detail ,regarding why in JOB 7 it is doing multiple "MAP PARTITIONS" then "DESERIALIZETOOBJECT" then "WHOLESTAGECODEGEN" and then again in JOB 8 it is doing "MAP" >"MAPPARTITIONSINTERNAL" > "WHOLESTAGECODEGEN"

@thebluephantom can you please give me the links.I have already tried so many however nothing relevant with respect to DAG — AGaur, May 06 '20 at 12:15
try this for example: https://stackoverflow.com/questions/37528047/how-are-stages-split-into-tasks-in-spark — thebluephantom, May 06 '20 at 12:25
I found this anwer to be useful: https://stackoverflow.com/questions/67339570/is-spark-read-csv-an-action-on-transformation — Inglocines, Aug 03 '21 at 14:34

job execution in spark

0 Answers0