I have a parquet file with the date column filled with a data type I am having trouble with
I understand that Hive and Impala tend to rebase their time stamp...However, I cannot seem to convert or find any pointers on how to solve this.
I have tried setting int96RebaseModeInRead and datetimeRebaseModeInRead mode to legacy
I also tried to apply a date schema onto the read operation but to no avail.
This is with schema applied
These are the documentations I've reviewed so far. Maybe there's a simple solution I am not seeing. Let's also assume that there's no way for me to ask the person who created the source file what the heck they did.
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option
https://kontext.tech/article/1062/spark-2x-to-3x-date-timestamp-and-int96-rebase-modes
Also, this thread is the only one I was able to find that shows how the timestamp is created but not how to reverse it. Please give me some pointers. parquet int96 timestamp conversion to datetime/date via python