Having this 12 byte array (int96) to timestamp.
[128 76 69 116 64 7 0 0 48 131 37 0]
How do I cast it to timestamp?
I understand the first 8 byte should be cast to int64 millisecond that represent an epoch datetime.
The first 8 bytes are time in nanosecs, not millisecs. They are not measured from the epoch either, but from midnight. The date part is stored separatly in the last 4 bytes as Julian day number.
Here is the result of an experiment I did earlier that may help. I stored '2000-01-01 12:34:56' as an int96 and dumped with parquet-tools:
$ parquet-tools dump hdfs://path/to/parquet/file | tail -n 1
value 1: R:0 D:1 V:117253024523396126668760320
Since 117253024523396126668760320 = 0x60FD4B3229000059682500, the 12 bytes are 00 60 FD 4B 32 29 00 00 | 59 68 25 00, where | shows the boundary between the time and the date parts.
00 60 FD 4B 32 29 00 00 is the time part. We need to reverse the bytes because int96 timestamp use a reverse byte order, thus we get 0x000029324BFD6000 = 45296 * 10^9 nanoseconds = 45296 seconds = 12 hours + 34 minutes + 56 seconds.
59 68 25 00 is the date part, if we reverse the bytes we get 0x00256859 = 2451545 as the Julian day number, which corresponds to 2000-01-01.
@Zoltan you definitely deserve the vote although you didn't supply a Golang sulotion.
Thanks to you and to https://github.com/carlosjhr64/jd
I wrote a function func int96ToJulian(parquetDate []byte) time.Time
func int96ToJulian(parquetDate []byte) time.Time {
nano := binary.LittleEndian.Uint64(parquetDate[:8])
dt := binary.LittleEndian.Uint32(parquetDate[8:])
l := dt + 68569
n := 4 * l / 146097
l = l - (146097*n+3)/4
i := 4000 * (l + 1) / 1461001
l = l - 1461*i/4 + 31
j := 80 * l / 2447
k := l - 2447*j/80
l = j / 11
j = j + 2 - 12*l
i = 100*(n-49) + i + l
tm := time.Date(int(i), time.Month(j), int(k), 0, 0, 0, 0, time.UTC)
return tm.Add(time.Duration(nano))
}