With spark we can easily read parquet files and use it as a Case Class with the following code:
spark.read.parquet("my_parquet_table").as[MyCaseClass]
With Flink, I'm having a lot a trouble to do that. My Case Class comes from an Avro schema, so it is a SpecificRecord.
I tried the following:
val parquetInputFormat = new ParquetRowInputFormat(new Path(path), messageType)
env.readFile(parquetInputFormat, path)
The issue here is the messageType, I was not able to convert my case class nor the avro schema to a valid messageType. I tried this:
val messageType = ParquetSchemaConverter.toParquetType(TypeInformation.of(classOf[MyCaseClass], true)
which ends with the following error:
class org.apache.flink.formats.avro.typeutils.AvroTypeInfo cannot be cast to class org.apache.flink.api.java.typeutils.RowTypeInfo
I could try to use the table-api, but it would mean having to create the all table schema myself and it would be a pain to maintain. If someone can indicate me an example of implementation, or propose anything that might help it will be greatly appreciated.