When reading a parquet file (using Scala) I read the timestamp field back as:
Int96Value{Binary{12 constant bytes, [0, 44, 84, 119, 54, 49, 0, 0, -62, -127, 37, 0]}}
How can I convert it to a date string?
When reading a parquet file (using Scala) I read the timestamp field back as:
Int96Value{Binary{12 constant bytes, [0, 44, 84, 119, 54, 49, 0, 0, -62, -127, 37, 0]}}
How can I convert it to a date string?
I did some research for you. The Int96
format is quite specific a seems to be deprecated.
Here is a discussion about converting Int96
to Date
.
Based on this, I created following piece of code:
def main(args: Array[String]): Unit = {
import java.util.Date
import org.apache.parquet.example.data.simple.{Int96Value, NanoTime}
import org.apache.parquet.io.api.Binary
val int96Value = new Int96Value(Binary.fromConstantByteArray(Array(0, 44, 84, 119, 54, 49, 0, 0, -62, -127, 37, 0)))
val nanoTime = NanoTime.fromInt96(int96Value)
val nanosecondsSinceUnixEpoch = (nanoTime.getJulianDay - 2440588) * (86400 * 1000 * 1000 * 1000) + nanoTime.getTimeOfDayNanos
val date = new Date(nanosecondsSinceUnixEpoch / (1000 * 1000))
println(date)
}
However, it prints Sun Sep 27 17:05:55 CEST 2093
. I am not sure, if this is a date, that you expected.
Edit: using Instance
as suggested:
val nanosInSecond = 1000 * 1000 * 1000;
val instant = Instant.ofEpochSecond(nanosecondsSinceUnixEpoch / nanosInSecond, nanosecondsSinceUnixEpoch % nanosInSecond)
println(instant) // prints 2093-09-27T15:05:55.933865216Z
java.time supports Julian days.
Credits to ygor for doing the research and finding out how to interpret the 12 bytes of your array.
byte[] int96Bytes = { 0, 44, 84, 119, 54, 49, 0, 0, -62, -127, 37, 0 };
// Find Julian day
int julianDay = 0;
int index = int96Bytes.length;
while (index > 8) {
index--;
julianDay <<= 8;
julianDay += int96Bytes[index] & 0xFF;
}
// Find nanos since midday (since Julian days start at midday)
long nanos = 0;
// Continue from the index we got to
while (index > 0) {
index--;
nanos <<= 8;
nanos += int96Bytes[index] & 0xFF;
}
LocalDateTime timestamp = LocalDate.MIN
.with(JulianFields.JULIAN_DAY, julianDay)
.atTime(LocalTime.NOON)
.plusNanos(nanos);
System.out.println("Timestamp: " + timestamp);
This prints:
Timestamp: 2017-10-24T03:01:50
I’m not happy about converting your byte array to an int
and a long
by hand, but I don’t know Parquet will enough to use the conversions that are probably available there. Use them if you can.
It doesn’t matter which LocalDate
we use as starting point since we are changing it to the right Julian day anyway, so I picked LocalDate.MIN
just to pick one.
The way I read the documentation, Julian days are always in the local time zone, that is, no time zone is understood, and they always start at midday (not midnight).