1

I want to search specific messages in kafka topic, The only solution that I found is using grep

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning |grep 'world\|hello'
  1. Is there an efficient way do do it ?
  2. Is there a way that I can limit the consumer with a specific offset, meaning reading from the beginning until in reaches specific offset ?
igx
  • 4,101
  • 11
  • 43
  • 88
  • 2
    i use this tool https://github.com/fgeller/kt It allows a variety of offset manipulation when reading from the topic. Grep is the way I do it. – pwaterz Feb 12 '19 at 14:59
  • If you are performing filtering kind of operation then use Streams API. If you really want to consume all messages then use Consumers API. You should not perform operations based on offsets. A producer that sent 1000th message would not be necessarily at 1000th offset in the partition. https://stackoverflow.com/questions/54544074/how-to-make-restart-able-producer https://stackoverflow.com/questions/54636524/kafka-streams-does-not-increment-offset-by-1-when-producing-to-topic/54638186#54638186 – JR ibkr Feb 12 '19 at 20:29

2 Answers2

3

Is there an efficient way do do it ?

If you don't have message keys, then no.

If you do, then you can compute a Murmur2 hash and find the partition number and only scan that one, still grepping with --partition

Is there a way that I can limit the consumer with a specific offset, meaning reading from the beginning until in reaches specific offset ?

You can give --max-messages

If you don't want to start always start from the beginning, add --group and keep running the same command with the max messages param. This will allow using the same consumer group, and commit the offsets when done

You can also manually commit offsets to start from using kafka-consumer-groups command

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
-4

Is there an efficient way do do it ?

Yes. Your solution is quick and dirty solution. If you want to filter data then use Streams API and write filtered information on another topic. https://kafka.apache.org/documentation/streams/

JR ibkr
  • 869
  • 7
  • 24
  • 1
    how this is different from my quick and dirty solution ? – igx Feb 14 '19 at 05:37
  • I don't think the question is looking to actually write Java code. Only find an event from the CLI – OneCricketeer Feb 14 '19 at 10:09
  • kafka is not just his high level apis. there's absolutely no need to create a new topic for this. – aran Feb 14 '19 at 10:23
  • Well I thought OP is grepping things out of topic on regular basis. Can anyone help me with https://stackoverflow.com/questions/54674867/how-to-reduce-disk-space-occupied-by-a-partition? – JR ibkr Feb 14 '19 at 14:21