1

I am very much new to spark and was trying to check the DAG creation in spark UI using :

when I am reading the simple csv file using the command

val df = spark.read.format("csv").option("header", "true").load("/home/user/test.csv") then in spark only 1 STAGE is creating with DAG as :

enter image description here

and I am not understanding what and why it is "MAP" > "MAPPARTITIONSINTERNAL" >"WHOLESTAGECODEGEN"

and when I am running the command including the "inferschema" options to TRUE then 2 STAGES are creating :

spark.read.format("csv").option("header", "true").option("inferSchema", true).load("/home/user/test.csv") with each stage has theirown DAG

STAGE 1 DAG :

enter image description here

STAGE 2 DAG: enter image description here

Can any body please help me ,why it is creating two stages when inferschema is TRUE and from where I can get the terms elaboration as mentioned in stages like "DESERIALIZETOOBJECT">"MAP" etc.

Waiting for the valuable inputs which can help to understand the DAG in detail ,regarding why in JOB 7 it is doing multiple "MAP PARTITIONS" then "DESERIALIZETOOBJECT" then "WHOLESTAGECODEGEN" and then again in JOB 8 it is doing "MAP" >"MAPPARTITIONSINTERNAL" > "WHOLESTAGECODEGEN"

AGaur
  • 185
  • 1
  • 1
  • 15

0 Answers0