1

The Problem

I'm trying to write a hudi table into minio s3 bucket by flink SQL, but it fails. The hudi table is created, but only contains meta data diretory .hoodie the directory tree is as follows:


myminio/flink-hudi
└─ t1
└─ .hoodie
├─ .aux
│  ├─ .bootstrap
│  │  ├─ .fileids
│  │  └─ .partitions
│  └─ ckp_meta
├─ .schema
├─ .temp
└─ archived

To Reproduce

Steps to reproduce the behavior:

  1. Creates a Flink Hudi table

CREATE TABLE t1(
uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 's3a://flink-hudi/t1',
'table.type' = 'MERGE_ON_READ'
);

  1. Insert data into the Hudi table

INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1');

Environment Description

  • Hudi version : 0.12.0

  • Hadoop version : 3.2.4

  • Flink version: 1.15.2

  • Storage (HDFS/S3/GCS..) : minio S3

  • Running on Docker? (yes/no) : no

Additional context

Added dependency:

  • hadoop-aws-3.2.4.jar
  • aws-java-sdk-bundle-1.11.901.jar
  • flink-s3-fs-hadoop-1.15.2.jar

properties in hadoop core-site.xml:

<property>
  <name>fs.s3a.access.key</name>
  <value>xxx</value>
</property>
<property>
  <name>fs.s3a.secret.key</name>
  <value>xxx</value>
</property>
<property>
  <name>fs.s3a.endpoint</name>
  <value>xxx</value>
</property>
<property>
  <name>fs.s3a.path.style.access</name>
  <value>true</value>
</property>
<property>
  <name>fs.s3.impl</name>
  <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>

flink-conf.yaml:

taskmanager.numberOfTaskSlots: 4

s3a.endpoint: xxx
s3a.access-key: xxx
s3a.secret-key: xxx
s3a.path.style.access: true

fs.hdfs.hadoopconf: /export/servers/hadoop-3.2.4/etc/hadoop

state.backend: rocksdb
state.backend.incremental: true
state.checkpoints.dir: s3a://flink-state/checkpoint
execution.checkpointing.interval: 30000

classloader.check-leaked-classloader: false

execute flink :


export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
./bin/start-cluster.sh
./bin/sql-client.sh embedded -j /opt/flink/jars/hudi-flink1.15-bundle-0.12.0.jar shell

Stacktrace


org.apache.hudi.exception.HoodieException: Exception while scanning the checkpoint meta files under path: s3a://flink-hudi/t1/.hoodie/.aux/ckp_meta
at org.apache.hudi.sink.meta.CkpMetadata.load(CkpMetadata.java:169)
at org.apache.hudi.sink.meta.CkpMetadata.lastPendingInstant(CkpMetadata.java:175)
at org.apache.hudi.sink.common.AbstractStreamWriteFunction.lastPendingInstant(AbstractStreamWriteFunction.java:243)
at org.apache.hudi.sink.common.AbstractStreamWriteFunction.initializeState(AbstractStreamWriteFunction.java:151)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:189)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:171)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:94)
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:286)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://flink-hudi/t1/.hoodie/.aux/ckp_meta
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2344)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2226)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2160)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1961)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$9(S3AFileSystem.java:1940)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1940)
at org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$listStatus$15(HoodieWrapperFileSystem.java:365)
at org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:106)
at org.apache.hudi.common.fs.HoodieWrapperFileSystem.listStatus(HoodieWrapperFileSystem.java:364)
at org.apache.hudi.sink.meta.CkpMetadata.scanCkpMetadata(CkpMetadata.java:216)
at org.apache.hudi.sink.meta.CkpMetadata.load(CkpMetadata.java:167)
... 18 more

Expected behavior

write hudi table into s3 bucket successfuly.

he wang
  • 11
  • 2
  • What happens if you use `s3p://` versus `s3a://` as the protocol? See https://stackoverflow.com/questions/74486511/i-encountered-an-error-when-use-flink-to-insert-data-into-a-apachi-hudi-table/74492812#74492812 – kkrugler Nov 27 '22 at 19:55

1 Answers1

0

Based on your directory tree, the table is created in the path s3a://flink-hudi/ and not inside t1, so when you tried to insert the data, Hudi didn't find the metadata in the right place. Try to add a / at the end of the table path:

CREATE TABLE t1(
uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 's3a://flink-hudi/t1/',
'table.type' = 'MERGE_ON_READ'
);
Hussein Awala
  • 4,285
  • 2
  • 9
  • 23