I'm trying to build a data stream processing system and I want to aggregate the data sent in the last minute from my sensors. The sensors send data to a Kafka server in the sensor
topic and it is consumed by Flink.
I'm using a Python generator with KafkaPython library and I send data in a JSON format. In the JSON theres is the field sent
containing a timestamp. Such parameter is generated in Python each 10 seconds using int(datetime.now().timestamp())
which I know returns a unix format timestamp in seconds.
The problem is that the system prints nothing! What am I doing wrong?
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// setting topic and processing the stream from Sensor
DataStream<Sensor> sensorStream = env.addSource(new FlinkKafkaConsumer010<>("sensor", new SimpleStringSchema(), properties))
.flatMap(new ParseSensor()) // parsing into a Sensor object
.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Sensor>() {
@Override
public long extractAscendingTimestamp(Sensor element) {
return element.getSent()*1000;
}
});
sensorStream.keyBy(meanSelector).window(TumblingEventTimeWindows.of(Time.minutes(1))).apply(new WindowMean(dataAggregation)).print();
During my attempts to make this work, I found the method .timeWindow()
instead of .window()
which worked! Being more precise, I wrote .timeWindow(Time.minutes(1))
.
N.B.: despite Flink ran for 5 minutes the window was printed only 1 time!