Apache Flink to use S3 for backend state and checkpoints

Question

Background

I was planning to use S3 to store the Flink's checkpoints using the FsStateBackend. But somehow I was getting the following error.

Error

org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.

Flink version: I am using Flink 1.10.0 version.

score 10 · Accepted Answer · answered Oct 06 '20 at 13:15

I have found the solution for the above issue, so here I am listing it in steps that are required.

Steps

We need to add some configs in the flink-conf.yaml file which I have listed below.

state.backend: filesystem
state.checkpoints.dir: s3://s3-bucket/checkpoints/ #"s3://<your-bucket>/<endpoint>"
state.backend.fs.checkpointdir: s3://s3-bucket/checkpoints/ #"s3://<your-bucket>/<endpoint>"


s3.access-key: XXXXXXXXXXXXXXXXXXX #your-access-key
s3.secret-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx #your-secret-key

s3.endpoint: http://127.0.0.1:9000 #your-endpoint-hostname (I have used Minio)

After completing the first step we need to copy the respective(flink-s3-fs-hadoop-1.10.0.jar and flink-s3-fs-presto-1.10.0.jar) JAR files from the opt directory to the plugins directory of your Flink.
- E.g:--> 1. Copy /flink-1.10.0/opt/flink-s3-fs-hadoop-1.10.0.jar to /flink-1.10.0/plugins/s3-fs-hadoop/flink-s3-fs-hadoop-1.10.0.jar // Recommended for StreamingFileSink
  2. Copy /flink-1.10.0/opt/flink-s3-fs-presto-1.10.0.jar to /flink-1.10.0/plugins/s3-fs-presto/flink-s3-fs-presto-1.10.0.jar //Recommended for checkpointing
Add this in checkpointing code

env.setStateBackend(new FsStateBackend("s3://s3-bucket/checkpoints/"))

After completing all the above steps re-start the Flink if it is already running.

Note:

If you are using both(flink-s3-fs-hadoop and flink-s3-fs-presto) in Flink then please use s3p:// specificly for flink-s3-fs-presto and s3a:// for flink-s3-fs-hadoop instead of s3://.
For more details click here.

One more thing: it is recommended to use `flink-s3-fs-presto` for checkpointing, and not `flink-s3-fs-hadoop`. The hadoop S3 tries to imitate a real filesystem on top of S3, and as a consequence, it has high latency when creating files and it hits request rate limits quickly. This is because before writing a key, it checks to see if the "parent directory" exists, which can involve a bunch of expensive S3 HEAD requests (which have very low request rate limits). — David Anderson, Oct 06 '20 at 17:15
Also, with Hadoop S3 you may come to a situation where you fail restore operations because it looks like a state file is not there (HEAD request leading to false caching in a S3 load balancer). Only after a while will the file be visible and only then will the restore succeed. — David Anderson, Oct 06 '20 at 17:17
So i encountered with the same issue as you and did the steps you recommended but i got weird error message saying "Caused by: java.lang.IllegalArgumentException: Cannot use the root directory for checkpoints" did you had it as well ? — Shalom Balulu, Oct 07 '20 at 08:06

Apache Flink to use S3 for backend state and checkpoints

Background

1 Answers1

Steps

Linked