I have the following scenario:
case class A(name:String,age:Int)
val df = List(A("s",2)).toDF
df.write.parquet("filePath")
val result = spark.read.parquet("filePath").as[A].select("age")
Is the above optimized to select only age
? Upon seeing result.explain
I see the following
'Project [unresolvedalias('age, None)]
+- Relation[name#48,age#49] parquet
== Analyzed Logical Plan ==
age: int
Project [age#49]
+- Relation[name#48,age#49] parquet
== Optimized Logical Plan ==
Project [age#49]
+- Relation[name#48,age#49] parquet
== Physical Plan ==
*(1) FileScan parquet [age#49] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Volumes/Unix/workplace/Reconciliation/src/TFSReconciliationCore/~/Downloa..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<age:int>
It seems only age
is read. But then what purpose does as
serve ? Am I correct in reading the physical plan ?