Flink checkpoints size are growing over 20GB and checkpoints time take over 1 minute

Question

First and foremost:

I'm kind of new to Flink (Understand the principle and is able to create any basic streaming job I need to)
I'm using Kinesis Analytics to run my Flink job and by default it's using incremental checkpointing with a 1 minute interval.
The Flink job is reading event from a Kinesis stream using a FlinkKinesisConsumer and a custom deserailzer (deserialze the byte into a simple Java object which is used throughout the job)

What I would like to archieve is simply counting how many event of ENTITY_ID/FOO and ENTITY_ID/BAR there is for the past 24 hours. It is important that this count is as accurate as possible and this is why I'm using this Flink feature instead of doing a running sum myself on a 5 minute tumbling window. I also want to be able to have a count of 'TOTAL' events from the start (and not just for the past 24h) so I also output in the result the count of events for the past 5 minutes so that the post porcessing app can simply takes these 5 minute of data and do a running sum. (This count doesn't have to be accurate and it's ok if there is an outage and I lose some count)

Now, this job was working pretty good up until last week where we had a surge (10 times more) in traffic. From that point on Flink went banana. Checkpoint size starting to slowly grow from ~500MB to 20GB and checkpoint time were taking around 1 minutes and growing over time. The application started failing and never was able to fully recover and the event iterator age shoot up never went back down so no new events were being consumed.

Since I'm new with Flink I'm not enterely sure if the way I'm doing the sliding count is completely un optimised or plain wrong.

This is a small snippet of the key part of the code:

The source (MyJsonDeserializationSchema extends AbstractDeserializationSchema and simply read byte and create the Event object):

SourceFunction<Event> source =
      new FlinkKinesisConsumer<>("input-kinesis-stream", new MyJsonDeserializationSchema(), kinesisConsumerConfig);

The input event, simple java pojo which will be use in the Flink operators:

public class Event implements Serializable {
  public String entityId;
  public String entityType;
  public String entityName;
  public long eventTimestamp = System.currentTimeMillis();
}

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

DataStream<Event> eventsStream = kinesis
      .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Event>(Time.seconds(30)) {
        @Override
        public long extractTimestamp(Event event) {
          return event.eventTimestamp;
        }
      })

DataStream<Event> fooStream = eventsStream
      .filter(new FilterFunction<Event>() {
        @Override
        public boolean filter(Event event) throws Exception {
          return "foo".equalsIgnoreCase(event.entityType);
        }
      })

 DataStream<Event> barStream = eventsStream
      .filter(new FilterFunction<Event>() {
        @Override
        public boolean filter(Event event) throws Exception {
          return "bar".equalsIgnoreCase(event.entityType);
        }
      })


StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
    Table fooTable = tEnv.fromDataStream("fooStream, entityId, entityName, entityType, eventTimestame.rowtime");
    tEnv.registerTable("Foo", fooTable);
    Table barTable = tEnv.fromDataStream("barStream, entityId, entityName, entityType, eventTimestame.rowtime");
    tEnv.registerTable("Bar", barTable);

Table slidingFooCountTable = fooTable
      .window(Slide.over("24.hour").every("5.minute").on("eventTimestamp").as("minuteWindow"))
      .groupBy("entityId, entityName, minuteWindow")
      .select("concat(concat(entityId,'_'), entityName) as slidingFooId, entityid as slidingFooEntityid, entityName as slidingFooEntityName, entityType.count as slidingFooCount, minuteWindow.rowtime as slidingFooMinute");

Table slidingBarCountTable = barTable
      .window(Slide.over("24.hout").every("5.minute").on("eventTimestamp").as("minuteWindow"))
      .groupBy("entityId, entityName, minuteWindow")
      .select("concat(concat(entityId,'_'), entityName) as slidingBarId, entityid as slidingBarEntityid, entityName as slidingBarEntityName, entityType.count as slidingBarCount, minuteWindow.rowtime as slidingBarMinute");

    Table tumblingFooCountTable = fooTable
      .window(Tumble.over(tumblingWindowTime).on("eventTimestamp").as("minuteWindow"))
      .groupBy("entityid, entityName, minuteWindow")
      .select("concat(concat(entityName,'_'), entityName) as tumblingFooId, entityId as tumblingFooEntityId, entityNamae as tumblingFooEntityName, entityType.count as tumblingFooCount, minuteWindow.rowtime as tumblingFooMinute");
   
    Table tumblingBarCountTable = barTable
      .window(Tumble.over(tumblingWindowTime).on("eventTimestamp").as("minuteWindow"))
      .groupBy("entityid, entityName, minuteWindow")
      .select("concat(concat(entityName,'_'), entityName) as tumblingBarId, entityId as tumblingBarEntityId, entityNamae as tumblingBarEntityName, entityType.count as tumblingBarCount, minuteWindow.rowtime as tumblingBarMinute");

    Table aggregatedTable = slidingFooCountTable
      .leftOuterJoin(slidingBarCountTable, "slidingFooId = slidingBarId && slidingFooMinute = slidingBarMinute")
      .leftOuterJoin(tumblingFooCountTable, "slidingFooId = tumblingBarId && slidingFooMinute = tumblingBarMinute")
      .leftOuterJoin(tumblingFooCountTable, "slidingFooId = tumblingFooId && slidingFooMinute = tumblingFooMinute")
      .select("slidingFooMinute as timestamp, slidingFooCreativeId as entityId, slidingFooEntityName as entityName, slidingFooCount, slidingBarCount, tumblingFooCount, tumblingBarCount");

    DataStream<Result> result = tEnv.toAppendStream(aggregatedTable, Result.class);
    result.addSink(sink); // write to an output stream to be picked up by a lambda function

I would greatly appreciate if someone with more experience in working with Flink could comment on the way I have done my counting? Is my code completely over engineered? Is there a better and more efficient way of counting events over a 24h period?

I have read somewhere in Stackoverflow @DavidAnderson suggesting to create our own sliding window using map state and slicing the event by timestamp. However I'm not exactly sure what this mean and I didn't find any code example to show it.

It looks like a good question for the aws dev forum or aws support. Imho you may want to join too many events from the same window, it may create too large state or run out of memory. It's only a possibility (no proof without logs) and I'm not sure what could be good solution — gusto2, Oct 11 '20 at 22:05
AWS is also involve in this however I suspect the issue is in the flink job. Looking at the metrics, cpu and heap mem it never really goes above 50%. It could be an issue aws side yes but there is much more chance that it's my job the issue — Marco, Oct 12 '20 at 08:52

score 0 · Answer 1 · answered Oct 11 '20 at 22:05

0

You are creating quite a few windows in there. If You are creating a sliding window with a size of 24h and slide of 5 mins this means that there will be a lot of open windows in there, so You may expect that all the data You have received in the given day will be checkpointed in at least one window if You think about it. So, it's certain that the size & time of the checkpoint will grow as the data itself grows.

To be able to get the answer if the code can be rewritten You would need to provide more details on what exactly are You trying to achieve here.

answered Oct 11 '20 at 22:05

Dominik Wosiński

3,769
1
8
22

I have updated the question for more clarity on the end goal – Marco Oct 12 '20 at 08:53
So, from what You are saying it's not exactly clear why are You using slidingWindow here . If You want to do full count of all elements that have arrived, why don't You simply go with tumbling window of 24h ? – Dominik Wosiński Oct 12 '20 at 09:37
Added a reason for this as well in the post. We don't want to bother doing any kind of complicated post processing to do our own sliding window. We want to take advantages of flink that does it for us nice and clean. The tumbling window is just going to be used to gauge more or less how many event we had since the begining, we don't care if that count is messed up by some downtime or errors – Marco Oct 12 '20 at 09:39
"Stream-in -> Flink -> Stream-out -> lambda -> Redis" The Result object is written to a Sink and a lambda picks it up to do some more processing and store it in a cache We use the 24h sliding count to do some statistics for an other application. – Marco Oct 12 '20 at 09:45
Okay, so You want this to be constantly updated (daily sum) not only just once per day is that right ? – Dominik Wosiński Oct 12 '20 at 09:46
Exactly, we want the cache updated every 5 minutes with the count for the past 24 hour (hence the sliding window of 24H with a 5 minute slide) – Marco Oct 12 '20 at 09:48

Flink checkpoints size are growing over 20GB and checkpoints time take over 1 minute

1 Answers1

Linked