How does Kafka decide which records are contained in the consumer poll loop when there are more than `max.poll.records` records left?

Question

I have a Kafka consumer group consuming several topics (each topic has more than one partition). All topics contain a considerable amount of records on each partition. I'm currently trying to make sense of the behavior when the consumer initially starts consuming. In particular, I'd like to know how the broker decides which records reach the client first.

The following aspects are noteworthy:

There are a lot more records than the consumer can process in one single roundtrip (i.e. more records than the consumer's max.poll.records configuration)
There are records from several topics and several partitions that the consumer has to read
I naively assumed that the broker returns records for each topic in each poll loop, so that the consumer reads all the topics at a similar pace. This doesn't seem to be the case though. Apparently it prioritizes records for a single topic at a time, switching the topic without an obvious pattern (at least that's what I'm seeing in the metrics of my consumer).

I couldn't find anything in the consumer config parameters that allows me to change this behavior. It's not really a problem, because all messages get read eventually. But I would like to understand the behavior in more detail.

So my question is: How does the broker decide which records end up in the result of a consumer's poll loop?

score 3 · Accepted Answer · answered Jul 22 '21 at 10:25

Consumer fetch records from Kafka using Fetch requests.

If you look at the protocol, this API is pretty complex and has many fields, but we can focus on a few fields that are relevant to your questions:

max_wait_ms: This indicates how long the broker should wait in case there's no/not enough records available. This is configurable using fetch.max.wait.ms.
min_bytes: This indicates how much data (the size of records) the broker needs to respond. This is configurable using fetch.min.bytes.
max_bytes: This indicates the maximum size of a response. This is configurable using fetch.max.bytes.

As soon as the broker hits one of these limits, it will send a response back.

The Fetch request also indicates which partitions the consumer wants to read. For each partition, there is partition_max_bytes that indicates the maximum size to return for that partition. This is configurable using max.partition.fetch.bytes.

In the past, Fetch requests contained the full list of partitions. The broker would iterate the list in order until it reached one of the limits mentioned above.

Since 1.1 (KIP-227), it's a bit more complicated as consumers use fetch sessions to avoid sending the full list in every fetch request. To keep it sinple, brokers use FetchSessions to keep an iterator on the partition list to ensure records are fetched from all partitions fairly.

Now let's look at the client side ...

At this point, you may have noticed that I've not mentioned max.poll.records. This setting is only used on the client side. Consumers try to fetch records efficiently. So even if you set max.poll.records=1, a consumer may fetch records in large batches, keep them in memory and only return 1 record each time poll() is called. This avoids sending many small requests and overloading brokers unnecessarily.

The consumer also keeps track of the records it has in memory. If it already has records for a partition, it can not include it in the next Fetch request.

So while each Fetch response may not include data all partitions, over a period of time, all partitions should be fetched fairly.

I've simplified the process to keep it short but if you want to dive into this logic, I'd recommend checking the following classes:

Fetcher.java: This is the client side logic that determines what to fetch from brokers and what to return in poll().
ReplicaManager.scala: This is the server side logic that determines what to return in a Fetch response. See fetchMessages().
FetchSession.scala: This is the session logic introduced by KIP-227

How does Kafka decide which records are contained in the consumer poll loop when there are more than `max.poll.records` records left?

1 Answers1