How to limit number of records in Kafka-consumer

Question

I am using confluent Kafka-rest product to consume records from a topic. My intention is to consume only first 100 records from topic. I am using the following REST API to fetch records

GET /consumers/testgroup/instances/my_consumer/records

How to achieve this? Any idea?

You are using a consumer group that keeps the consumer offset, so when you ask for new records you're not getting the first records of the topic. You get new records that you haven't yet consumed. Do you really want the first 100 records of the topic or do you want to consume 100 records at each API Rest Call? — Alexandre Juma, Dec 03 '18 at 12:11
Also, it seems that the only size control parameter you can use with the [GET records endpoint](https://docs.confluent.io/current/kafka-rest/docs/api.html#get--consumers-(string-group_name)-instances-(string-instance)-records) is `max_bytes`, which does not translate directly to number of records, but should work for you. — Alexandre Juma, Dec 03 '18 at 12:24
don't think it's possible: `Consumer configuration - Although consumer instances are not shared, they do share the underlying server resources. Therefore, limited configuration options are exposed via the API. However, you can adjust settings globally by passing consumer settings in the REST Proxy configuration.` But https://docs.confluent.io/current/kafka-rest/docs/config.html doesn't mention any relevant setting — Lior Chaga, Dec 03 '18 at 12:33

score 3 · Answer 1 · answered Dec 03 '18 at 13:16

As far as I'm aware this is not currently possible. As mentioned in the other answer, you can specify a max size in bytes (although this can actually be ignored by the brokers in some cases) but you cannot specify the desired number of messages.

However, such a feature can be easily implemented in your client code. You could guess a rough size, query the REST API and see how many messages you've received. If it's less than 100, then query it again to get the next few messages until you reached 100.

Ok. But I don't know about incoming data. It is dynamic. So along with max_bytes confluent can support number of messages to retrieve from topic. — Achaius, Dec 03 '18 at 13:55

score 1 · Answer 2 · edited Jun 13 '19 at 14:40

1

It is possible to use property ConsumerConfig.MAX_POLL_RECORDS_CONFIG for configuring your KafkaConsumer. Please see the doc

edited Jun 13 '19 at 14:40

Rahul Agarwal

4,034
7
27
51

answered Jun 13 '19 at 13:55

Kirill S

21
2

Alexandre Juma · Answer 3 · 2018-12-03T14:02:50.053

If you're trying to consume new batches of 100 messages from your consumer group, you should set max_bytes to a value that, for your data model, will always return roughly 100 records. You can have a logic that is more conservative (get less and then get some more until cutoff at 100) or you can get always more and then ignore. In both ways you should adopt a manual offset management of your consumer group.

GET /consumers/testgroup/instances/my_consumer/records?max_bytes=300000

If you get more than 100 messages and for some reason you ignorem them, you will not receive them again on that consumer group if offset auto commit is enabled (it's defined when you created your consumer). You probably don't want this to happen!

If you're manually commiting offsets, than you can ignore whatever you want if you then commit the correct offsets to guarantee you don't loose any message. You can manually commit your offsets like this:

POST /consumers/testgroup/instances/my_consumer/offsets HTTP/1.1
Host: proxy-instance.kafkaproxy.example.com
Content-Type: application/vnd.kafka.v2+json

{
  "offsets": [
    {
      "topic": "test",
      "partition": 0,
      "offset": <calculated offset ending where you stopped consuming for this partition>
    },
    {
      "topic": "test",
      "partition": 1,
      "offset": <calculated offset ending where you stopped consuming for this partition>
    }
  ]
}

If you're trying to get exactly the first 100 records of the topic, than you need to reset the consumer group offsets for that topic and each partition before you consume once agaion. You can do it this way (taken from confluent):

POST /consumers/testgroup/instances/my_consumer/offsets HTTP/1.1
Host: proxy-instance.kafkaproxy.example.com
Content-Type: application/vnd.kafka.v2+json

{
  "offsets": [
    {
      "topic": "test",
      "partition": 0,
      "offset": 0
    },
    {
      "topic": "test",
      "partition": 1,
      "offset": 0
    }
  ]
}

By using `max_bytes` to control the number of returned message, you assume all messages have the exact same size. In practice, it's rarely the case so this is unlikely to work — Mickael Maison, Dec 03 '18 at 13:13
Yes, that's why I stressed " you should set max_bytes to a value that, for your data model, will always return around 100 records", but I will further clarify. — Alexandre Juma, Dec 03 '18 at 13:52

How to limit number of records in Kafka-consumer

3 Answers3