I use a TumblingWindow
with EventTime
. Here is a simplified version of my program:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic( TimeCharacteristic.EventTime );
DataStream<String> inputStream = env.addSource(
new FlinkKafkaConsumer09<>( kafkaInput, new SimpleStringSchema(), prop ) );
SingleOutputStreamOperator<Measure> measureStream = inputStream
.flatMap( new StringToFloatMeasureFlatMapper() )
.returns( Measure.class )
.keyBy( "objectId" )
// extract event data from measure object
.assignTimestampsAndWatermarks( new BoundedLatenessMeasureMarker() );
SingleOutputStreamOperator<MeanMinMaxStdAccumulator> stdResults =
stdMean.keyBy( "objectId" )
.timeWindow( Time.minutes(15) )
.sum("floatValue");
stdResults.print();
env.execute();
I extract the timestamps and create the watermarks using the following class:
public class BoundedLatenessMeasureMarker implements AssignerWithPeriodicWatermarks<Measure>{
private final long maxOutOfOrderness = 4000;
private long currentMaxTimestamp;
@Override
public long extractTimestamp(Measure measure, long previousElementTimestamp) {
long time = measure.timestamp.getTime();
currentMaxTimestamp = Math.max(time, currentMaxTimestamp);
return time;
}
@Override
public Watermark getCurrentWatermark() {
// return the watermark as current highest timestamp minus the out-of-orderness bound
Watermark watermark = new Watermark( currentMaxTimestamp - maxOutOfOrderness );
return watermark;
}
}
My problem is: when I run this against real data (i.e. using a kafka topic getting input from about 2000 sensors generating measures every few seconds) it works well, but when I try to reproduce it on a test topic, the windows never close.
Here is a sample of my test data:
kafka-console-consumer --bootstrap-server localhost:9092 --topic input-test --from-beginning
{"objectId":2,"unitSymbol":"V","timestamp":"2017-01-09T16:57:18","value":"1.0","type":"float","floatValue":0.0}
{"objectId":2,"unitSymbol":"V","timestamp":"2017-01-09T16:58:38","value":"1.0","type":"float","floatValue":0.0}
{"objectId":2,"unitSymbol":"V","timestamp":"2017-01-09T17:07:49","value":"1.0","type":"float","floatValue":0.0}
{"objectId":2,"unitSymbol":"V","timestamp":"2017-01-09T18:37:59","value":"1.0","type":"float","floatValue":0.0}
{"objectId":2,"unitSymbol":"V","timestamp":"2017-01-09T19:18:19","value":"1.0","type":"float","floatValue":0.0}
{"objectId":2,"unitSymbol":"V","timestamp":"2017-01-09T19:20:16","value":"1.0","type":"float","floatValue":0.0}
As you can see, the timestamps are clearly ordered in time and cover more than one window of 15 minutes. By adding some println, I could also determine that the watermarks are updating and the fold (in this case the sum) is also done. But windows don't finalize, so the stdResults.print()
never outputs anything.
Anybody has a hint on what going on here ?
UPDATE: I tried switching from the periodic timestamp to the AscendingTimestampExtractor
, same problem.
UPDATE: I also checked the kafka topics, they have the exact same configuration.