I'm trying to write a Pyflink application for measuring latency and throughput. My data comes as json objects from a kafka topic and is loaded into a DataStream
using the SimpleStringSchema
-class for deserialization. Following the answer to this post (How performance can be tested in Kafka and Flink environment?) I have the Kafka-producer put timestamps in the events but struggle to understand how I can access those timestamps now. I am aware that the mentioned post offers a solution to this problem but I'm struggling to transfer this example to python as there is very little documentation/examples.
This other post (Apache Flink: How to get timestamp of events in ingestion time mode?) suggests that I should define a ProcessFunction
instead. However, here too I'm unsure about the syntax. I would probably have to do something like this (taken from: https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-python-test/python/datastream/data_stream_job.py)
class MyProcessFunction():
def process_element(self, value, ctx):
result = value.get_time_stamp()
yield result
What would be the correct way to do value.get_time_stamp()
here? Or is there maybe an even simpler way of solving my problem that I'm not aware of?
Thanks!