2

we have Hadoop cluster based on HDP - version 2.6.4 and ambari ,

Note - Hadoop cluster include the hiveserver2 and hivemetastore

we are running presto queries , for searching data on many partitions in HDFS filesystem

the issue , is strange because every new query , its failed on different snappy.parquet

This is typical example from query log in Presto dashboard

io.prestosql.spi.PrestoException: Error opening Hive split hdfs://hdfsha/POP/GGFR/eyes_data/eyes_data_daily/seen=1/lop_mode=DC/krtfg=202007/day=20200715/part-00157-91a34c06-45b9-40b7-a882-8716fdcf92be.c000.snappy.parquet (offset=0, length=16679): Could not obtain block: BP-2021402966-34.4.23.10-1523182226447:blk_1245415177_171674861 file=/POP/GGFR/eyes_data/eyes_data_daily/seen=1/lop_mode=DC/krtfg=202007/day=20200715/part-00157-91a34c06-45b9-40b7-a882-8716fdcf92be.c000.snappy.parquet
               at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:230)
               at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:120)
               at io.prestosql.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:164)
               at io.prestosql.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:98)
               at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:45)
               at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
               at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:234)
               at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:169)
               at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
               at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
               at io.prestosql.operator.WorkProcessorPipelineSourceOperator.getOutput(WorkProcessorPipelineSourceOperator.java:380)
               at io.prestosql.operator.Driver.processInternal(Driver.java:379)
               at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
               at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
               at io.prestosql.operator.Driver.processFor(Driver.java:276)
               at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
               at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
               at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
               at io.prestosql.$gen.Presto_317____20200727_140355_1.run(Unknown Source)
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
               at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2021402966-34.4.23.10-1523182226447:blk_1245415177_171674861 file=/POP/GGFR/eyes_data/eyes_data_daily/seen=1/lop_mode=DC/krtfg=202007/day=20200715/part-00157-91a34c06-45b9-40b7-a882-8716fdcf92be.c000.snappy.parquet
               at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:879)
               at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
               at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
               at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:997)
               at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1360)
               at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1324)
               at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
               at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
               at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:120)
               at io.prestosql.parquet.reader.MetadataReader.readFully(MetadataReader.java:313)
               at io.prestosql.parquet.reader.MetadataReader.readFooter(MetadataReader.java:92)
               at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:161)

regarding to - io.prestosql.spi.PrestoException: Error opening Hive split

any idea what could be the reason for this query failed?

is it something that we can fixed by update HIVE parameters in ambari?

or maybe tune presto parameters?

jessica
  • 2,426
  • 24
  • 66

0 Answers0