0

I am facing the following exception when reading the parquet file having date column. I am using beam-sdks-java-io* 2.11.0 and parquet*-1.10 please, help me for the same.

Thank You in advance.

Caused by: java.lang.IllegalArgumentException: INT96 not yet implemented.
    at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:279)
    at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:264)
    at org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:297)
    at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:263)
    at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:241)
    at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:231)
    at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:130)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
    at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
    at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
    at org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$ReadFn.processElement(ParquetIO.java:221)
Nikhil_Java
  • 81
  • 2
  • 9
  • This is the expected behavior as per the implementation `"org/apache/parquet/avro/AvroSchemaConverter.java": public Schema convertINT96(PrimitiveTypeName primitiveTypeName) { throw new IllegalArgumentException("INT96 not yet implemented.");` – J_D Apr 24 '19 at 11:45
  • https://stackoverflow.com/questions/48366196/parquet-data-timestamp-columns-int96-not-yet-implemented-in-druid-overlord-hadoo – J_D Apr 24 '19 at 11:46

2 Answers2

3

Parquet INT96 type is "deprecated" but the parquet-avro library added a property in the 1.12.0 release to allow customers with old large datasets to be able to reprocess it again and convert into a supported type (fixed 12 byte array).[1]

You could pass the parquet.avro.readInt96AsFixed property with the value "true". [2]

If you are using ParquetIO from Beam:

    PCollection<GenericRecord> records =
        pipeline.apply(ParquetIO.read(SCHEMA).from(options.getMyParquetFilesLocation())
            .withConfiguration(Map.of("parquet.avro.readInt96AsFixed", "true")));

[1] https://issues.apache.org/jira/browse/PARQUET-1928

[2] https://github.com/apache/parquet-mr/tree/master/parquet-avro

Bruno
  • 182
  • 2
  • 12
0

As per Parquet-Avro documentation int96 (parquet type) is not supported in Avro type mapping.

Ajeesh
  • 1,572
  • 3
  • 19
  • 32