the parquet spec about logical types and Timestamp specifically, seems to say https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md "TIMESTAMP_MILLIS is used for a combined logical date and time type. It must annotate an int64 that stores the number of milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC.
"
i.e. here it says that the type is only precise to the point of miliseconds and it starts from 1970.
but if u look at the hive-parquet code in https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L142 https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java#L54 it seems that hive's encoding of timestamp on parquet is of a different spec, precise to the point of nano seconds, and starting from "Monday, January 1, 4713 " (defined in jodd.datetime.JDateTime)
so Hive's parquet timestamp storage is completely different from the above spec ?
what about Date support? https://issues.apache.org/jira/browse/HIVE-8119 are we going to have a different on-disk binary encoding than the "int32" specified in the above doc?
thanks