0

I using SpringBoot and want read data from Kafka using batch. My application.yml look like this:

spring:
  kafka:
    bootstrap-servers:
      - localhost:9092
    properties:
      schema.registry.url: http://localhost:8081
    consumer:
      auto-offset-reset: earliest
      max-poll-records: 50000
      enable-auto-commit: true
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
      group-id: "batch"
      properties:
        fetch.min.bytes: 1000000
        fetch.max.wait.ms: 20000
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
    listener:
      type: batch

My listener:

@KafkaListener(id = "bar2", topics = "TestTopic")
public void listen(List<ConsumerRecord<String, GenericRecord>> records) {
   log.info("start of batch receive. Size::{}", records.size());
}

In log I see:

2019-10-04 11:08:19.693  INFO 2123 --- [     bar2-0-C-1] kafka.batch.demo.DemoApplication         : start of batch receive. Size::33279
2019-10-04 11:08:19.746  INFO 2123 --- [     bar2-0-C-1] kafka.batch.demo.DemoApplication         : start of batch receive. Size::33353
2019-10-04 11:08:19.784  INFO 2123 --- [     bar2-0-C-1] kafka.batch.demo.DemoApplication         : start of batch receive. Size::33400
2019-10-04 11:08:19.821  INFO 2123 --- [     bar2-0-C-1] kafka.batch.demo.DemoApplication         : start of batch receive. Size::33556
2019-10-04 11:08:39.859  INFO 2123 --- [     bar2-0-C-1] kafka.batch.demo.DemoApplication         : start of batch receive. Size::16412

I set the required, settings: fetch.min.bytes and fetch.max.wait.ms, but they do not give any effect.

In a log I see that a pack in the size no more than 33 thousand at any settings. I broke my mind and I don't understand why is this happening?

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
All_Safe
  • 1,339
  • 2
  • 23
  • 43
  • There's other settings like max.fetch.bytes you might want to look at – OneCricketeer Sep 26 '19 at 10:17
  • probably need to look at this https://stackoverflow.com/questions/51753883/increase-the-number-of-messages-read-by-a-kafka-consumer-in-a-single-poll/51755259#51755259 – Ryuzaki L Jan 14 '20 at 00:27

1 Answers1

1

max.poll.records is simply a maximum.

There are other properties that influence how many records you get

  • fetch.min.bytes - The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request. The default setting of 1 byte means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive. Setting this to something greater than 1 will cause the server to wait for larger amounts of data to accumulate which can improve server throughput a bit at the cost of some additional latency.
  • fetch.max.wait.ms- The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes.

See the documentation.

There is no way to exactly control the minimum number of records (unless they are all identical in length).

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • Thanks for the answer, but I still don't understand why the pack size is always different. I saw your answer here: https://stackoverflow.com/questions/50370851/kafka-spring-batch-listener-flush-batch, and got this result at a size of 30_000: start of batch receive. Size::2812 start of batch receive. Size::30000 start of batch receive. Size::2839 start of batch receive. Size::867 Pack sizes are always different – All_Safe Oct 01 '19 at 08:29
  • You need to increase the properties I mentioned if you want to wait longer for a "full" batch to be available. – Gary Russell Oct 01 '19 at 13:01
  • Hi, @Gary Russell, I updated my question. See please. I use this settings: `fetch.min.bytes` and `fetch.max.wait.ms`, but they do not give any effect. – All_Safe Oct 04 '19 at 08:23