Unable to read hive external tables using Spark 3.3.x when using iceberg jars with erasure coding

Question

When trying to read erasure coding enabled hive external tables in an on-prem hdfs environment with iceberg jars using Spark 3.3.1, I get below error. I am able to read the same table created with default config i.e. without erasure coding. Is there any Spark or hdfs config which will allow us to read erasure cod hive tables in Spark 3.3

Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-177356802-10.28.113.126-1620307273641:blk_-9223372036631387152_45059009
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:976)
    at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1083)
    at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1439)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1402)
    at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:78)
    at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:107)
    at org.apache.orc.impl.ReaderImpl.read(ReaderImpl.java:701)
    at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:793)
    at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:566)
    at org.apache.orc.OrcFile.createReader(OrcFile.java:385)
    at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$2(OrcFileFormat.scala:146)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2763)
    at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:146)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:209)

Unable to read hive external tables using Spark 3.3.x when using iceberg jars with erasure coding

0 Answers0