Apache Flink StreamingFileSink making several HEAD requests while writing to S3 which causes ratelimiting

Question

I have an Apache Flink application that I have deployed on Kinesis Data analytics.

This application reads from Kafka and writes to S3. The S3 bucket structure it writes to is computed using a BucketAssigner.A stripped down version of the BucketAssigner here

The problem I have is, let us say we have to write to this directory structure: s3://myBucket/folder1/folder2/folder3/myFile.json

Before making the PUT request, it makes a the following HEAD requests:

HEAD /folder1
HEAD /folder1/folder2
HEAD /folder1/folder2/folder3/

And then it makes the PUT request.

It is doing it for each and every request, which is causing S3 rate limiting and there by backpressure in my FLink application.

I found that someone had a similar issue with BucketingSink: https://lists.apache.org/thread/rbp2gdbxwdrk7zmvwhd2bw56mlwokpzz

The solution mentioned there was to switch to StreamingFileSink which is what I am doing .

Any ideas on how to fix this in StreamingFileSink?

My SinkConfig is as follows:

StreamingFileSink
  .forRowFormat(new Path(s3Bucket), new JsonEncoder<>())
  .withBucketAssigner(bucketAssigner)
  .withRollingPolicy(DefaultRollingPolicy.builder()
                .withRolloverInterval(60000)
                .build())
  .build()

The JsonEncoder takes the object and converts it to json and writes out bytes like this

I have described more details about how the whole pipeline works in this question if that helps in anyway: Heavy back pressure and huge checkpoint size

Would you be able to test this with the FileSink from Flink? The StreamingFileSink in Flink has been deprecated. — Martijn Visser, Mar 17 '22 at 08:27
I was running this on debug mode I could see there is a Hashmap that maintains active buckets. So the local file system always finds it in the hashmap. Potentially what is happening in my case when I run on s3 is that my cardinality is too high for this hashmap to handle? — Vinod Mohanan, Mar 17 '22 at 09:58
Also I am on flink 1.13 since we are using AWS KDA to run Flink. What would be the alternative to StreamingFileSink? — Vinod Mohanan, Mar 17 '22 at 10:00
That would be FileSink. See https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/datastream/file_sink/ — Martijn Visser, Mar 17 '22 at 13:17

David Anderson · Accepted Answer · 2022-03-17T15:31:07.893

1

The Hadoop S3 file system tries to imitate a filesystem on top of S3. This means that:

before writing a key it checks if the "parent directory" exists by checking for a key with the prefix up to the last "/"
it creates empty marker files to mark the existence of such a parent directory
all these "existence" requests are S3 HEAD requests which are both expensive and start to violate consistent read-after-create visibility

As a result, the Hadoop S3 file system has very high "create file" latency and it hits request rate limits very quickly (HEAD requests have very low request rate limits on S3). As a consequence, it's best to find ways to write to fewer distinct files.

You might also explore using entropy injection. Entropy injection is happening at the file system level, so it should work with the FileSink. Except I'm not sure how it will interact with the partitioning/bucketing being done by the sink, so you may or may not find it useable in practice. If you try it, please report back!

edited Mar 17 '22 at 15:31

answered Mar 17 '22 at 08:45

David Anderson

39,434
4
33
60

Seems like Entropy is only for checkpoint data. I guess I am stuck between if I reduce cardinality, I will hit limits because of PUT and if I increase cardinality I will hit limits because of HEAD – Vinod Mohanan Mar 17 '22 at 10:04
Entropy injection is happening at the file system level, so it should work with the FileSink. Except I'm not sure how it will interact with the partitioning/bucketing being done by the sink; you may or may not find it useable in practice. If you try it, please report back! – David Anderson Mar 17 '22 at 15:18
Sure, will do. For now I removed the sub-prefix that is the source of maximum cardinality and that seems to give a big boost in performance. Zero back pressure. Exploring this further to see when we hit put limits – Vinod Mohanan Mar 18 '22 at 08:26
We have fiddled with the bucket's prefix structure and have eliminated a couple of root prefixes to prevent them from getting hit with HEAD request from all the child prefixes and that has solved the issue for us. But I am really curious: Is there any plan to move away from Hadoop FS? Or am I alone on this bucket limits issue that it does not need such a step from the Flink community? – Vinod Mohanan Mar 24 '22 at 15:20
You could always ask on the user mailing list, but I don't recall any significant discussion about trying to improve on this. – David Anderson Mar 24 '22 at 15:54

Apache Flink StreamingFileSink making several HEAD requests while writing to S3 which causes ratelimiting

1 Answers1