while reading any csv, it is always converting into 3 stages whether csv file has small size or big or only it has headers in file. and there is always three jobs that has one stage per job. and my application has no any transformation and action.It is only loading csv.
public class WordCount {
public static void main(String[] args) throws InterruptedException {
SparkSession spark = SparkSession.builder().appName("Java Spark
Application").master("local").getOrCreate();
Dataset<Row> df = spark.read()
.format("com.databricks.spark.csv")
.option("inferschema", "true")
.option("header", "true")
.load("/home/ist/OtherCsv/EmptyCSV.csv");
spark.close();
}}
Spark UI images:
- three jobs in spark UI
- stages relates info
- all three stages have same dag visualization
- and all three jobs have same dag visualization
- and this is event timeline
Questions:
- why loading or reading csv always split into exactly three stages and three jobs.
- why it is converting into three jobs when there is no any action?
- how stages are formed in code level?