0

We are considering to use Flink SQL for ad hoc analytics on real-time kafka data in the past 5 - 10 minutes. To achieve that, it seems that we need to extend the Kafka connector to have it only read messages in a given period of time, and use that to generate the finite input source.

I am wondering if there is an alternative approach on this. Any suggestions will be very welcome.

yuyang
  • 1,511
  • 2
  • 15
  • 40

1 Answers1

1

The Flink Kafka connector supports setting the start position in various ways, including myConsumer.setStartFromTimestamp(...). The Kafka table connector appears to support these same options.

If you want to use Flink's SQL client, you may need to write a thin wrapper that deals with computing the timestamp from 10 minutes ago and sets the starting Kafka offset accordingly.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • what i want is to have Flink process only kafka data in the past 10 minutes, and stop after that. It seems that currently the submitted flink job will run for ever, as long as new messages come in to kafka. – yuyang Feb 16 '19 at 16:44
  • Why is this a problem? For ad hoc analysis, seems like you could simply cancel the job when you've seen enough. – David Anderson Feb 16 '19 at 18:23
  • The context is that we are trying to build a service that allow users to submit SQL query and use Flink to do ad hoc analytics on kafka real-time data. We would like to have the job stop automatically, instead of letting the users to cancel the job. We need a clear protocol on data that the queries process. – yuyang Feb 16 '19 at 19:01