5

I'm developing a PoC with Kafka Streams. Now I need to get the offset value in the stream consumer and use it to generate a unique key (topic-offset)->hash for each message. The reason is: the producers are syslog and only few of them have IDs. I cannot generate a UUID in the consumer because in case of a reprocess I need to regenerate the same key.

My problem is: the org.apache.kafka.streams.processor.ProcessorContext class exposes an .offset() method that returns the value, but I'm using KStream instead of the Processor, and I couldn't find a method that returns the same thing.

Anybody knows how to extract the consumer value for each row from a KStream? Thanks in advance.

Konrad
  • 355
  • 6
  • 18

2 Answers2

7

You can use mix-and-match DSL and Processor API via process(...), transform(...), and transformValues(...).

It allows you to access the current record offset similar to plain Processor API. In you case, it seems you want to use KStream#transform(...).

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • See also: http://stackoverflow.com/questions/40814437/how-to-filter-keys-and-value-with-a-processor-using-kafka-stream-dsl – Matthias J. Sax Nov 28 '16 at 17:32
  • 1
    Even with Processor API, we only have access to key and value and not to ConsumerRecord, only timestamp extractor seem to have ConsumerRecord. Can you please add additional details about accessing partitionId, timestamp, offset and other fields. – Vignesh Chandramohan Dec 08 '16 at 02:11
  • As mentioned in the question, Processor API provides a `ProcessorContext` object via `Processor#init(...)` method. `ProcessorContext` is updated before each call to `process()` with the metadata of the next record that will be processed. Thus, when `process()` gets called, you can get record offset etc by calling the corresponding `ProcessorContext` methods. Just keep a reference to it in a class member variable that you initialize with the provided `ProcessorContext` in `init()`. – Matthias J. Sax Dec 08 '16 at 06:28
  • 1
    Thanks! ProcessorContext has everything needed. – Vignesh Chandramohan Dec 09 '16 at 00:36
  • @MatthiasJ.Sax How can the latest offset be retrieved? I only find the current offset in the processor context – Andras Hatvani Apr 29 '22 at 11:32
  • You mean the end offset of the input topics? There is no API in Kafka Streams for this. – Matthias J. Sax Apr 30 '22 at 16:55
0

Unfortunately, apparently if one Kafka Streams application is assigned to multiple partitions which creates different tasks, the ProcessorContext might be assigned to different tasks and then topic=null, partition=-1, orrset=-1.

Did anyone encountered this?


Edit: the reason of that is that I have broken TransformerSupplier API and returned same instance from it. Always creating new instance fixes the issue.

Athlan
  • 6,389
  • 4
  • 38
  • 56