With my huge surprise I realized that the "suppress" operator does not emit the last event at window close but only when another event is published on the partition the streams' tasks belong to. So, how to emit the final aggregate result without having never ending event stream? In a CDC pattern we cannot wait for a subsequent database operation, which could take place after a long time, to emit the final result of the previous aggregation.
The idea is to schedule a FutureTask to send a particular event, let's suppose with a key with a fixed "FLUSH" value, which timestamp fall out of the previous aggregation window. This "FLUSH" event, then, will be filtered out after the suppress step.
For Every record peeked from the stream, will be scheduled a "FLUSH" event, eventually replacing the previous one not yet started to minimize the unnecessary "FLUSH" events.
In this example I used a Tumbling Window, but conceptually it works with other Window kinds too.
Therefore, let's suppose we have a topic "user" and want to aggregate in a list all records falling in a 10 seconds 'Tumbling Window'.
The model are:
User.java
public class User {
private String name;
private String surname;
private String timestamp;
}
UserGrouped.java
public class UserGrouped {
private List<User> userList = new ArrayList<User>();
}
Topology
...
KStream<String, User> userEvents = builder.stream(userTopic, userConsumerOptions);
TimeWindows tumblingWindow = TimeWindows.of(Duration.ofSeconds(windowDuration))
.grace(Duration.ofSeconds(windowGracePeriod));
KStream<String,UserGrouped> userGroupedStram = userEvents
.peek( (key,value) -> {
//Filter out the previous "flush" event to avoid scheduler loop
if (!key.equalsIgnoreCase("FLUSH")) {
//For each event is scheduled a future task that
//will send a "flush" event to all partition assigned to the stream.
scheduleFlushEvent(value.getTimestamp());
}
})
.groupByKey()
.windowedBy(tumblingWindow)
.aggregate(
//INITIALIZER
() -> new UserGrouped(),
//AGGREGATOR
(key, user, userGrouped) -> {
userGrouped.getUserList().add(user);
return userGrouped;
},
//STREAM STORE
Materialized.<String,UserGrouped,WindowStore<Bytes, byte[]>>
as("userGroupedWindowStore")
.withKeySerde(Serdes.String())
.withValueSerde(JsonSerdes.UserGrouped()) //Custom Serdes
)
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded().shutDownWhenFull()))
.toStream( (windowedKey,value) -> windowedKey.key())
//Discard the flush event
.filterNot((key,value) -> key.equalsIgnoreCase("FLUSH"))
.peek( (key, value) -> {
int sizeList = value != null && value.getUserList() != null ? value.getUserList().size() : 0;
log.info("#### USER GROUPED KEY: {}, Num of elements: {}",key, sizeList);
})
;
The scheduler method
private void scheduleFlushEvent(String lastEventTimestamp) {
//add 1 second to (windowSizeInSeconds + windowGracePeriod) to ensure that the flush event will be out of last window
Long delay = Long.valueOf(windowDuration + windowGracePeriod + 1);
//FIND PARTITIONS ASSIGNED TO THE CURRENT STREAM.
//The partitions assigned may change after rebalance events,
//so I need to get them in every iteration.
//In a Spring context you can use a RebalanceListener to update a 'partitionList'
//field of this class defined with @Component annotation
Set<Integer> partitionList = new HashSet<Integer>();
StreamThread currThread = (StreamThread)Thread.currentThread();
for (TaskMetadata taskMetadata : currThread.threadMetadata().activeTasks()) {
for(TopicPartition topicPartition : taskMetadata.topicPartitions()) {
partitionList.add(topicPartition.partition());
}
}
Callable<List<RecordMetadata>> task = () -> {
try {
List<RecordMetadata> recordMetadataList = new ArrayList<RecordMetadata>();
Instant instant = Instant.from(DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ")
.parse(lastEventTimestamp));
instant = instant.plusSeconds(delay);
String flushEventTimestamp = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ")
.withZone(ZoneId.systemDefault() )
.format(instant);
User userFlush = new User();
userFlush.setTimestamp(flushEventTimestamp);
String userFlushValue = new String(JsonSerdes.User().serializer().serialize(userTopic, userFlush));
//SEND FLUSH EVENT TO ALL PARTITION ASSIGNED TO THE STREAM THREAD
for(Integer partition : partitionList) {
ProducerRecord<String,String> userRecord = new ProducerRecord<String, String>(userTopic, partition, "FLUSH", userFlushValue);
RecordMetadata recordMetadata = userFlushProducer.send(userRecord).get();
recordMetadataList.add(recordMetadata);
log.info("SENT FLUSH EVENT PARTITION: {}, VALUE: {}",partition, userFlushValue);
}
return recordMetadataList;
} catch (Exception e) {
log.error("ERROR", e);
return null;
}
};
//TASK NOT SCHEDULED YET
if(scheduledFuture == null
|| scheduledFuture.isDone()) {
log.debug("task scheduled");
scheduledFuture = ses.schedule(task, delay, TimeUnit.SECONDS);
//TASK ALREADAY SCHEDULED.
//Stop the previous scheduled task and start a newer task with an postponed delay
} else {
if(!scheduledFuture.isDone()
&& scheduledFuture.cancel(false)) {
log.debug("task RE-scheduled");
scheduledFuture = ses.schedule(task, delay, TimeUnit.SECONDS);
} else {
log.warn("task not RE-scheduled");
}
}
}