1

I'm reading from a parquet file and I noticed per the schema that our dates are being read as INT96 represented as byte[12]. So when reading from the parquet file, a date will look like an Object [0, 0, 0, 0, 0, 0, 0, 0, -63, -120, 37, 0].

Does anyone know how I'd go about converting this to a usable format so I could get the date? Or if there's a way I'm supposed to tweak my s3 parquet file reader so I can get the actual date instead of that byte array?

Thanks!

  • Update: Found an answer. First casted it to GenericData.fixed, then did ``` byte[] bytes = fixed.bytes(); // little endian encoding - need to invert byte order long timeOfDayNanos = Longs.fromBytes(bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0]); int julianDay = Ints.fromBytes(bytes[11], bytes[10], bytes[9], bytes[8]); long ts = ((julianDay - JULIAN_EPOCH_OFFSET_DAYS) * MILLIS_IN_DAY) + (timeOfDayNanos / NANOS_PER_MILLISECOND); ``` – NuttronIndustries Oct 20 '22 at 00:14
  • can you pls quote the source of where you got this from ? I need to do the same and would like to know more about it. – Kiran K Mar 02 '23 at 16:09

0 Answers0