I am reading a csv file and writing into a parquet file partitioned by a col. After reading from the csv file this is what i am getting
>>> df.printSchema()
root
|-- col1: string (nullable = true)
|-- col5: double (nullable = true)
|-- col6: timestamp (nullable = true)
|-- col7: string (nullable = true)
>>> df.show()
[Stage 0:> (0 + 1) / 1]
+----+-----+-------------------+-------------+
|col1| col5| col6| col7|
+----+-----+-------------------+-------------+
| f| 3.34|1970-01-01 00:00:00|this is test3|
| f| 2.13|1980-02-05 00:00:00|this is test3|
| f|12.13|1981-02-05 00:00:00|this is test3|
| e| 2.3|1982-03-05 00:00:00|this is test3|
| e| 2.3|1983-04-12 00:00:00|this is test3|
| e|212.0|1984-05-04 00:00:00|this is test3|
| e| 2.13|1985-01-10 00:00:00|this is test3|
+----+-----+-------------------+-------------+
when i am using this dataframe to write into a partitioned parquet file its getting succesfully written but then when i am reading from there and displaying the value the timestamp col is coming as NULL, though the datatype is still timestamp only
>>> df.write.partitionBy("col1").mode("append").parquet("<some_location>/testparquetData/")
>>> df1 = spark.read.parquet("<some_location>/testparquetData/")
>>> df1.show()
+-----+----+-------------+----+
| col5|col6| col7|col1|
+-----+----+-------------+----+
| 2.3|null|this is test3| e|
| 2.3|null|this is test3| e|
|212.0|null|this is test3| e|
| 2.13|null|this is test3| e|
| 3.34|null|this is test3| f|
| 2.13|null|this is test3| f|
|12.13|null|this is test3| f|
+-----+----+-------------+----+
>>> df1.printSchema()
root
|-- col5: double (nullable = true)
|-- col6: timestamp (nullable = true)
|-- col7: string (nullable = true)
|-- col1: string (nullable = true)
i am not sure what exactly is happening here
Now initially i was reading the csv file with inferSchema=true and thought may be that is the reason so i explicitly passed the schema and read the file but post writing into the parquet file and trying to read that, the result is still coming as null.
can somebody help me what i am missing here ?