2

I'm trying to build a data stream processing system and I want to aggregate the data sent in the last minute from my sensors. The sensors send data to a Kafka server in the sensor topic and it is consumed by Flink.

I'm using a Python generator with KafkaPython library and I send data in a JSON format. In the JSON theres is the field sent containing a timestamp. Such parameter is generated in Python each 10 seconds using int(datetime.now().timestamp()) which I know returns a unix format timestamp in seconds.

The problem is that the system prints nothing! What am I doing wrong?

// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

// setting topic and processing the stream from Sensor
DataStream<Sensor> sensorStream = env.addSource(new FlinkKafkaConsumer010<>("sensor", new SimpleStringSchema(), properties))
                                     .flatMap(new ParseSensor()) // parsing into a Sensor object
                                     .assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Sensor>() {
                                         @Override
                                         public long extractAscendingTimestamp(Sensor element) {
                                             return element.getSent()*1000;
                                         }
                                     });
sensorStream.keyBy(meanSelector).window(TumblingEventTimeWindows.of(Time.minutes(1))).apply(new WindowMean(dataAggregation)).print();

During my attempts to make this work, I found the method .timeWindow() instead of .window() which worked! Being more precise, I wrote .timeWindow(Time.minutes(1)). N.B.: despite Flink ran for 5 minutes the window was printed only 1 time!

TheNobleSix
  • 101
  • 10
  • Do you have data coming from the source continuously ? (I mean at least one every minute) – ImbaBalboa Apr 04 '17 at 13:43
  • I get 100 json every 10 seconds – TheNobleSix Apr 04 '17 at 13:44
  • Do you have env.execute() somewhere? Streaming flink jobs are executed lazily, and only after execute() is called. – David Anderson May 03 '17 at 12:00
  • Could you please explain why are you multiplying your timestamp by 1000? Also, try with processing time and see if you are getting all the desired windowed events, if yes, then your watermark is not advancing somehow (it needs to be triggered) you need to manually do that then by implementing assignerWithPeriodicWatermark or PunctuatedWatermark depending on your usecase. – Biplob Biswas Aug 14 '17 at 15:26

0 Answers0