we have Hadoop cluster based on HDP - version 2.6.4 and ambari ,
Note - Hadoop cluster include the hiveserver2 and hivemetastore
we are running presto queries , for searching data on many partitions in HDFS
filesystem
the issue , is strange because every new query , its failed on different snappy.parquet
This is typical example from query log in Presto dashboard
io.prestosql.spi.PrestoException: Error opening Hive split hdfs://hdfsha/POP/GGFR/eyes_data/eyes_data_daily/seen=1/lop_mode=DC/krtfg=202007/day=20200715/part-00157-91a34c06-45b9-40b7-a882-8716fdcf92be.c000.snappy.parquet (offset=0, length=16679): Could not obtain block: BP-2021402966-34.4.23.10-1523182226447:blk_1245415177_171674861 file=/POP/GGFR/eyes_data/eyes_data_daily/seen=1/lop_mode=DC/krtfg=202007/day=20200715/part-00157-91a34c06-45b9-40b7-a882-8716fdcf92be.c000.snappy.parquet
at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:230)
at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:120)
at io.prestosql.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:164)
at io.prestosql.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:98)
at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:45)
at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:234)
at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:169)
at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at io.prestosql.operator.WorkProcessorPipelineSourceOperator.getOutput(WorkProcessorPipelineSourceOperator.java:380)
at io.prestosql.operator.Driver.processInternal(Driver.java:379)
at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
at io.prestosql.operator.Driver.processFor(Driver.java:276)
at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.prestosql.$gen.Presto_317____20200727_140355_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2021402966-34.4.23.10-1523182226447:blk_1245415177_171674861 file=/POP/GGFR/eyes_data/eyes_data_daily/seen=1/lop_mode=DC/krtfg=202007/day=20200715/part-00157-91a34c06-45b9-40b7-a882-8716fdcf92be.c000.snappy.parquet
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:879)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:997)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1360)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1324)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:120)
at io.prestosql.parquet.reader.MetadataReader.readFully(MetadataReader.java:313)
at io.prestosql.parquet.reader.MetadataReader.readFooter(MetadataReader.java:92)
at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:161)
regarding to - io.prestosql.spi.PrestoException: Error opening Hive split
any idea what could be the reason for this query failed?
is it something that we can fixed by update HIVE parameters in ambari?
or maybe tune presto parameters?