Spark streaming - Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file

Question

Im using spark to write my json data to s3. However, I keep getting the below error. We are using apache hudi for updates. This only happens for some data, everything else works fine.

Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 
 in file s3a://<path to parquet file>
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.ja va:251)

App > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)

App > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)

 App > at com.uber.hoodie.func.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:45)

App > at com.uber.hoodie.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:44)

App > at com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:94)

App > at java.util.concurrent.FutureTask.run(FutureTask.java:266)

 App > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

App > ... 4 more

App > Caused by: java.lang.UnsupportedOperationException:org.apache.parquet.avro.AvroConverters$FieldLongConverter

I am unable to understand. I followed a few threads and set --conf "spark.sql.parquet.writeLegacyFormat=true" in my spark confs. but even this didnt help.

score 5 · Accepted Answer · answered Dec 28 '19 at 14:13

5

Found out the issue. The issue was with schema mismatch in existing parquet files and incoming data. One of the fields was string in existing parquet schema, and it was being sent as long in the newer chunk of data.

answered Dec 28 '19 at 14:13

mythic

535
7
21

so how did you solve it? can you give some steps on what you did which solved this issue. – timedacorn Apr 18 '23 at 09:17
As far as I recollect, I casted the data to string in the new data and made it compatible with the existing data. Nothing complex! Just make the data types common across all files. Hope this helps! – mythic May 26 '23 at 09:07

Spark streaming - Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file

1 Answers1