2

I have a sliding window and custom aggregation accumulator that might have empty results. What would be a proper way to discard such en 'empty' aggregation accumulators from getting into a sink?

        Pipeline pipeline = Pipeline.create();
        pipeline.drawFrom(Sources.<Long, Foo>map("map"))
                .map(Map.Entry::getValue)
                .addTimestamps(Foo::getTimeMillisecond, LIMIT)
                .window(WindowDefinition.sliding(100, 10))
                .aggregate(FooAggregateOperations.aggregateFoo(), (s, e, r) -> {
                    return String.format("started: %s\n%s\nended: %s\n", s, r, e);
                })
                .drainTo(Sinks.files(sinkDirectory));

As you can see the aggregator return String:

public class FooAggregateOperations {

    public static AggregateOperation1<Foo, FooAccumulator, String> aggregateFoo() {
        return AggregateOperation
                .withCreate(FooAccumulator::new)
                .andAccumulate(FooAggregateOperations::accumulate)
                .andCombine(FooAggregateOperations::combine)
                .andDeduct(FooAggregateOperations::deduct)
                .andFinish(FooAccumulator::getResult);
    }
}

The question is basically, what is the way to discard ignorable windows/aggregation results before they proceed to be combined/deducted with other results or flushed into sink?

Oliv
  • 10,221
  • 3
  • 55
  • 76
Viktor Stolbin
  • 2,899
  • 4
  • 32
  • 53

1 Answers1

1

To filter out aggregation results which are empty you can use the following approach:

    Pipeline pipeline = Pipeline.create();
    pipeline.drawFrom(Sources.<Long, Foo>map("map"))
            .map(Map.Entry::getValue)
            .addTimestamps(Foo::getTimeMillisecond, LIMIT)
            .window(WindowDefinition.sliding(100, 10))
            .aggregate(FooAggregateOperations.aggregateFoo(),
                    (s, e, r) -> tuple3(s, e, r))
            .filter(t -> !isEmpty(t.f2()))
            .map(t -> String.format("started: %s\n%s\nended: %s\n", t.f0(), t.f2(), t.f1()))
            .drainTo(Sinks.files("sinkDirectory"));

What this does is store the aggregation result in a temporary tuple, and then apply a filtering afterwards and then the final mapping.

I have also created an issue on GitHub and we'll consider supporting this behaviour right inside the aggregation operation.

Can Gencer
  • 8,822
  • 5
  • 33
  • 52