2

With spark we can easily read parquet files and use it as a Case Class with the following code:

spark.read.parquet("my_parquet_table").as[MyCaseClass]

With Flink, I'm having a lot a trouble to do that. My Case Class comes from an Avro schema, so it is a SpecificRecord.

I tried the following:

val parquetInputFormat = new ParquetRowInputFormat(new Path(path), messageType)
env.readFile(parquetInputFormat, path)

The issue here is the messageType, I was not able to convert my case class nor the avro schema to a valid messageType. I tried this:

val messageType = ParquetSchemaConverter.toParquetType(TypeInformation.of(classOf[MyCaseClass], true)

which ends with the following error: class org.apache.flink.formats.avro.typeutils.AvroTypeInfo cannot be cast to class org.apache.flink.api.java.typeutils.RowTypeInfo

I could try to use the table-api, but it would mean having to create the all table schema myself and it would be a pain to maintain. If someone can indicate me an example of implementation, or propose anything that might help it will be greatly appreciated.

Fray
  • 173
  • 6
  • After another look on the documentation, I see flink does not support complex types for parquet (array, map, etc), so it is not a valid option for me. – Fray May 26 '21 at 13:10
  • Hi, but was this a root cause of the issue you were having? – Kristoff Dec 09 '21 at 09:21
  • Hi @Kristoff, the issue was not directly linked but this limitation was too big to continue this way, and the documentation and the community help is very little on this subject... – Fray Jan 19 '22 at 17:31

0 Answers0