2

I have grouped my kafka events:

    private static void createImportStream(final StreamsBuilder builder, final Collection<String> topics) {
        final KStream<byte[], GraphEvent> stream = builder.stream(topics, Consumed.with(Serdes.ByteArray(), new UserEventThriftSerde()));
        stream.filter((key, request) -> {
            return Objects.nonNull(request);
        }).groupBy(
                (key, value) -> Integer.valueOf(value.getSourceType()),
                Grouped.with(Serdes.Integer(), new UserEventThriftSerde()))
              .aggregate(ArrayList::new, (key, value, aggregatedValue) -> {
                          aggregatedValue.add(value);
                          return aggregatedValue;
                      },
                      Materialized.with(Serdes.Integer(), new ArrayListSerde<UserEvent>(new UserEventThriftSerde()))
              ).toStream();
    }

how can I add a window but not based on time, but based on number of events. The reason is that the events will be a bulk dump, a time windowed aggregation would not fit since all events could appear in the same few seconds.

Alex P.
  • 3,073
  • 3
  • 22
  • 33

1 Answers1

3

Kafka Streams does not support count-based windows out-of-the box because those are non-deterministic and it's hard to handle out-of-order data.

Instead of using the DSL, you can use the Processor API to build a custom operator for your use case though.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137