1

the parquet spec about logical types and Timestamp specifically, seems to say https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md "TIMESTAMP_MILLIS is used for a combined logical date and time type. It must annotate an int64 that stores the number of milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC.

"

i.e. here it says that the type is only precise to the point of miliseconds and it starts from 1970.

but if u look at the hive-parquet code in https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L142 https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java#L54 it seems that hive's encoding of timestamp on parquet is of a different spec, precise to the point of nano seconds, and starting from "Monday, January 1, 4713 " (defined in jodd.datetime.JDateTime)

so Hive's parquet timestamp storage is completely different from the above spec ?

what about Date support? https://issues.apache.org/jira/browse/HIVE-8119 are we going to have a different on-disk binary encoding than the "int32" specified in the above doc?

thanks

teddy teddy
  • 3,025
  • 6
  • 31
  • 48

1 Answers1

1

Based on a discussion that used to be linked here but has been removed since, it seems that when support for saving timestamps in Parquet was added to Hive, the primary goal was to be compatible with Impala's implementation, which probably predates the addition of the timestamp_millis type to the Parquet specification.

Impala's timestamp representation maps to the int96 Parquet type (4 bytes for the date, 8 bytes for the time, details here).

So no, storing a Hive timestamp in Parquet does not use the timestamp_millis type, but Impala's int96 timestamp representation instead.

Zoltan
  • 2,928
  • 11
  • 25
  • Link is broken. I need to know specifics of the `int96` representation, any chance you can provide updated link to details? – James Wierzba Feb 12 '19 at 02:37
  • Unfortunately that link was not crawled by https://web.archive.org/, but I added a different one to a detailed example of dissecting an int96 timestamp. – Zoltan Feb 12 '19 at 08:26