3

I have a Kafka topic called A.

format of data in topic A is :

{ id : 1, name:stackoverflow, created_at:2017-09-28 22:30:00.000}
{ id : 2, name:confluent, created_at:2017-09-28 22:00:00.000}
{ id : 3, name:kafka, created_at:2017-09-28 24:42:00.000}
{ id : 4, name:apache, created_at:2017-09-28 24:41:00.000}

Now in consumer side i want to get only latest data of one hour window means every one hour i need to get latest value from topic based on created_at

My expected output is :

{ id : 1, name:stackoverflow, created_at:2017-09-28 22:30:00.000}
{ id : 3, name:kafka, created_at:2017-09-28 24:42:00.000}

I think this can be solve by ksql but i m not sure. Please help me.

Thank in advance.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
shakeel
  • 801
  • 1
  • 8
  • 24

1 Answers1

4

Yes, you can use KSQL for this. Try the following:

CREATE STREAM S1 (id BIGINT, name VARCHAR, created_at VARCHAT) WITH (kafka_topic = 'topic_name', value_format = 'JSON');

CREATE TABLE maxRow AS SELECT id, name, max(STRINGTOTIMESTAMP(created_at, 'yyyy-mm-dd hh:mm:ss.SSS')) AS creted_at FROM s1 WINDOW TUMBLING (size 1 hour) GROUP BY id, name;

The result will have the created_at time in linux timestamp format. You can change it into your desired format using TIMESTAMPTOSTRING udf in a new query. Please let me know if you find any issues.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
Hojjat
  • 684
  • 4
  • 4
  • Thanks for your response, can i reduce 1 hour window to 10 minutes also, it is going to any performance issue? – shakeel Dec 18 '17 at 06:15
  • Sure, you can use `(size 10 minutes)`. It should not have any significant performance issues. – Hojjat Dec 18 '17 at 21:19
  • Thank for your response, one more question does ksql going to store data in memory or disk ? – shakeel Dec 19 '17 at 04:05
  • 1
    The internal state store uses RocksDB and stores the state in memory. The results of queries will be written into kafka topics which are of course on disk! – Hojjat Dec 20 '17 at 21:08
  • @matthias-j-sax Is it possible to achieve this using KTable? if yes any examples ? – Edayan Jul 17 '19 at 06:48
  • What if the events also include other metric properties to be included into the _latest-event_ view? We can't group by them, and using max on them will clearly give the wrong results. – Holger Brandl Oct 21 '19 at 08:05