0

I'm trying to write a Pyflink application for measuring latency and throughput. My data comes as json objects from a kafka topic and is loaded into a DataStream using the SimpleStringSchema-class for deserialization. Following the answer to this post (How performance can be tested in Kafka and Flink environment?) I have the Kafka-producer put timestamps in the events but struggle to understand how I can access those timestamps now. I am aware that the mentioned post offers a solution to this problem but I'm struggling to transfer this example to python as there is very little documentation/examples.

This other post (Apache Flink: How to get timestamp of events in ingestion time mode?) suggests that I should define a ProcessFunction instead. However, here too I'm unsure about the syntax. I would probably have to do something like this (taken from: https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-python-test/python/datastream/data_stream_job.py)

class MyProcessFunction():

    def process_element(self, value, ctx):
        result = value.get_time_stamp()
        yield result

What would be the correct way to do value.get_time_stamp() here? Or is there maybe an even simpler way of solving my problem that I'm not aware of?

Thanks!

Cipollino
  • 37
  • 1
  • 7

1 Answers1

2

When you set up a table that is backed by a Kafka topic, you can declare a virtual column for the Kafka timestamp, like the event_time column in this example:

CREATE TABLE KafkaTable (
  `event_time` TIMESTAMP(3) METADATA FROM 'timestamp',
  `partition` BIGINT METADATA VIRTUAL,
  `offset` BIGINT METADATA VIRTUAL,
  `user_id` BIGINT,
  `item_id` BIGINT,
  `behavior` STRING
) WITH (
  'connector' = 'kafka',
  'topic' = 'user_behavior',
  'properties.bootstrap.servers' = 'localhost:9092',
  'properties.group.id' = 'testGroup',
  'scan.startup.mode' = 'earliest-offset',
  'format' = 'csv'
);

See the documentation for Flink's Kafka Table connector for more info about working with the metadata in the Kafka headers.

David Anderson
  • 39,434
  • 4
  • 33
  • 60