0

Issue After I parse the parquet file, the timestamp value is showing as 16 character long string, e.g. "ACrpbPIZAABOfyUA". I would like to convert it into a readable timestamp format like 2017-10-24T03:01:50 in Java.

SO Search In SO, I find that my question is similar to parquet int96 timestamp conversion to datetime/date via python. However, I need to programmatically convert the string value into timestamp in Java. So, the solution for Doug to use spark-shell or parquet-tools command line in the mentioned post doesn't work for me.

Further Search I also found that if the timestamp value is in 12 byte array format, I can convert it following Cast int96 timestamp from parquet to golang. However, I got stuck on understanding the string format or figuring out a way to convert the string into the 12 byte array format.

Your help is highly appreciated!

  • How are you parsing the parquet file? Will you show a [mcve] that illustrates what you have done so far? – Code-Apprentice May 26 '23 at 21:09
  • I ask because my first suggestion is to find a library that gives you a date object rather than the string. If you are already using a library for parsing the file, see if it has a function for this conversion. If you are not relying on a library and insist on doing all the work yourself, then you might need to use base64 decoding to turn the string into an array of bytes. This is just a guess because the example you give looks like it might be base64 encoded. – Code-Apprentice May 26 '23 at 21:11
  • 3
    That String looks like it is Base64 encoded... `Base64.getDecoder().decode("ACrpbPIZAABOfyUA")` will return a 12 byte array – user16320675 May 26 '23 at 21:34
  • I found a Java solution [here](https://stackoverflow.com/questions/53690299/int96value-to-date-string). You'll need to find a way to convert the `String` value to a byte-array. – Reilas May 26 '23 at 23:34
  • 2
    If I decode _"ACrpbPIZAABOfyUA"_ from _Base64_, I get _"2016-02-03T07:55:29"_. – Reilas May 27 '23 at 01:11
  • @user16320675, you are correct that the string is base64 encoded. I tested it that I can successfully parse the string to timestamp format. – etcxuyr May 29 '23 at 15:59

2 Answers2

2

I converted the Go code from here.

LocalDateTime int96ToJulian(byte[] parquetDate) {
    ByteBuffer buffer;
    buffer = ByteBuffer.wrap(Arrays.copyOfRange(parquetDate, 0, 8));
    buffer.order(ByteOrder.LITTLE_ENDIAN);
    long nano = buffer.getLong();
    buffer = ByteBuffer.wrap(Arrays.copyOfRange(parquetDate, 8, parquetDate.length));
    buffer.order(ByteOrder.LITTLE_ENDIAN);
    int dt = buffer.getInt();

    long l = dt + 68569;
    long n = 4 * l / 146097;
    l = l - (146097*n+3)/4;
    long i = 4000 * (l + 1) / 1461001;
    l = l - 1461*i/4 + 31;
    long j = 80 * l / 2447;
    long k = l - 2447*j/80;
    l = j / 11;
    j = j + 2 - 12*l;
    i = 100*(n-49) + i + l;

    LocalDateTime localDateTime = LocalDateTime.of((int) i, (int) j, (int) k, 0, 0, 0);
    return localDateTime.plusNanos(nano);
}

Example using "2000-01-01 12:34:56", here.

byte[] bytes = { 0x00, 0x60, (byte) 0xfd, 0x4b, 0x32, 0x29, 0x00, 0x00, 0x59, 0x68, 0x25, 0x00 };
System.out.println(int96ToJulian(bytes).format(DateTimeFormatter.ISO_DATE_TIME));

Output

2000-01-01T12:34:56
Reilas
  • 3,297
  • 2
  • 4
  • 17
0

Here's another way of slicing it. I'm not 100% happy as there's possibly something more solid in the java.util.time package when it comes to treating the Julian date, but as I said above, my results correspond with those of @Reilas:

import java.time.LocalDate;
import java.time.LocalTime;
import java.time.LocalDateTime;
import java.util.Base64;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public class Int69 {
    /*
     * Julian Day is a well-known system that represents the count of whole days
     * since day 0, which is defined to be January 1, 4713 BCE in the Julian
     * calendar, and -4713-11-24 Gregorian.
     */
    public final static LocalDate JULIAN_YEAR_ZERO = LocalDate.parse("-4713-11-24");

    public static void main(String[] args) {
        try {
            LocalDateTime a = Int96.int69ToLocalDateTime("ACrpbPIZAABOfyUA");
            System.out.println(a);
            byte[] bytes = { 0x00, 0x60, (byte) 0xfd, 0x4b, 0x32, 0x29, 0x00, 0x00, 0x59, 0x68, 0x25, 0x00 };

            LocalDateTime b = Int96.int69ToLocalDateTime(bytes);
            System.out.println(b);
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

    public static LocalDateTime int69ToLocalDateTime(byte[] int69) {
        ByteBuffer bb = ByteBuffer.wrap(int69).order(ByteOrder.LITTLE_ENDIAN);
        long nanoSeconds = bb.getLong();
        int daysSinceYearZero = bb.getInt();
        LocalTime time = LocalTime.MIN.plusNanos(nanoSeconds);
        return JULIAN_YEAR_ZERO.plusDays(daysSinceYearZero).atTime(time);
    }

    public static LocalDateTime int69ToLocalDateTime(String base64) {
        return int69ToLocalDateTime(Base64.getDecoder().decode(base64));
    }

}
g00se
  • 3,207
  • 2
  • 5
  • 9
  • Can you provide any insight on the date format of, _"-4713-11-24"_. I am misunderstanding; Julian years are in _["astronomical years"](https://en.wikipedia.org/wiki/Astronomical_year_numbering)_? – Reilas May 27 '23 at 16:04
  • Certainly. I commented the code above. That's taken from [here](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/temporal/JulianFields.html#JULIAN_DAY) – g00se May 27 '23 at 16:31