Why do int96 timestamps not work for me?
I want to read the Parquet files with S3 Select. S3 Select does not support timestamps saved as int96 according to the documentation. Also, storing timestamps in parquet as int96 is deprecated.
What did I try?
Firehose uses org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
for serialization to parquet. (The exact hive version that is used by AWS is unknown.) While reading the hive code, I came across the following config switch: hive.parquet.write.int64.timestamp
. I tried to apply this config switch by changing the Serde parameters in the AWS Glue table config:
Unfortunately, this did not make a difference and my timestamp column is still stored as int96 (checked by downloading a file from S3 and inspecting it with
parq my-file.parquet --schema
)