0

I have a parquet record created with hudi off a spark kinesis stream and stored in S3.

An AWS glue table is generated from this record. I update the InputRecord type to org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat as per instructions https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi

From the presto-cli i run

presto-cli --catalog hive --schema my-schema --server my-server:8889
presto:my-schema> select * from table

this returns

Query 20200211_185222_00050_hej8h, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20200211_185222_00050_hej8h failed: No value present

however when i run

select id from table

it returns

    id    
----------
 34551832 
(1 row)

Query 20200211_185250_00051_hej8h, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 93B] [2 rows/s, 213B/s]

is this expected behaviour? or is there an underlying issue with the setup between Hudi/AWS Glue/Presto

Update 12-Feb-2020

Stack track using --debug option

presto:schema> select * from table;

Query 20200212_092259_00006_hej8h, FAILED, 1 node
http://xx-xxx-xxx-xxx.xx-xxxxx-xxx.compute.amazonaws.com:8889/ui/query.html?20200212_092259_00006_hej8h
Splits: 17 total, 0 done (0.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 23% active
Per Node: 0.1 parallelism,     0 rows/s,     0B/s
Parallelism: 0.1
Peak Memory: 0B
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20200212_092259_00006_hej8h failed: No value present
java.util.NoSuchElementException: No value present
    at java.util.Optional.get(Optional.java:135)
    at com.facebook.presto.parquet.reader.ParquetReader.readArray(ParquetReader.java:156)
    at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:282)
    at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
    at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
    at com.facebook.presto.parquet.reader.ParquetReader.readStruct(ParquetReader.java:193)
    at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:276)
    at com.facebook.presto.parquet.reader.ParquetReader.readBlock(ParquetReader.java:268)
    at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:247)
    at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:225)
    at com.facebook.presto.spi.block.LazyBlock.assureLoaded(LazyBlock.java:283)
    at com.facebook.presto.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:274)
    at com.facebook.presto.spi.Page.getLoadedPage(Page.java:261)
    at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:254)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
    at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
    at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
    at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
    at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
    at com.facebook.presto.$gen.Presto_0_227____20200211_134743_1.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

byte_array
  • 2,767
  • 1
  • 16
  • 10
Adam
  • 432
  • 5
  • 16

1 Answers1

0

Appears the problem may be elsewhere, issue raised with hudi team here --> https://github.com/apache/incubator-hudi/issues/1325

Adam
  • 432
  • 5
  • 16