Im using spark to write my json data to s3. However, I keep getting the below error. We are using apache hudi for updates. This only happens for some data, everything else works fine.
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0
in file s3a://<path to parquet file>
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.ja va:251)
App > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
App > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
App > at com.uber.hoodie.func.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:45)
App > at com.uber.hoodie.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:44)
App > at com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:94)
App > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
App > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
App > ... 4 more
App > Caused by: java.lang.UnsupportedOperationException:org.apache.parquet.avro.AvroConverters$FieldLongConverter
I am unable to understand. I followed a few threads and set --conf "spark.sql.parquet.writeLegacyFormat=true" in my spark confs. but even this didnt help.