How do I get the size of a single record in Kafka?
There's some exposition as to why I need this.
This does not appear to be the serializedValueSize exposed on the ConsumerRecord or RecordMetadata classes. I don't really understand the value of this property, since it doesn't match the size of the message useful for the consumer. What is the serializedValueSize used for if not this?
I am trying to make my Kafka java application behave like "min.poll.records" if it existed to complement "max.poll.records". I have to do this because it's required :). Assuming all messages on a given topic are the same size (which is true in this case), this should be possible from the consumer side by setting the fetch.min.bytes equal to the amount of messages to batch times the byte size of each message.
This exists:
https://kafka.apache.org/documentation/#consumerapi
max.poll.records
The maximum number of records returned in a single call to poll().
This doesn't exist, but is the behavior I want:
min.poll.records
The minimum number of records returned in a single call to poll(). If not enough records are available before the time specified in fetch.max.wait.ms elapses, then the records are returned anyway, and as such, this is not an absolute minimum.
Here's what I've found so far:
On the producer side, I have "batch.size" set to 1 byte. This forces the producer to send each message individually.
On the consumer size, I have "max.partition.fetch.bytes" set to 291 bytes. This makes the consumer only ever get back 1 message. Setting this value to 292 makes the consumer get back 2 messages sometimes. So I have calculated the message size to be half of 292; The size of one message is 146 bytes.
The above bullets require changes to the Kafka configuration and involve manually looking at / grepping some server logs. It'd be great if the Kafka Java API provided this value.
On the producer side, Kafka provides a way to get the serialized sizes for a record in the RecordMetadata.serializedValueSize method. This value is 76 bytes, much different from the 146 bytes given in the test above.
On the consumer size, Kafka provides the ConsumerRecord API. The serialized value size from this record is also 76. The offset just increments by one each time (not by the byte size of the record).
The size of the key is -1 bytes (the key is null).
System.out.println(myRecordMetadata.serializedValueSize());
// 76
# producer
batch.size=1
# consumer
# Expected this to work:
# 76 * 2 = 152
max.partition.fetch.bytes=152
# Actually works:
# 292 = ??? magic ???
max.partition.fetch.bytes=292
I expected that setting the max.partition.fetch.bytes to a multiple of the number of bytes given by the serializedValueSize would make the Kafka consumer receive at maximum that number of records from a poll. Instead, the max.partition.fetch.bytes value needs to be much higher for this to happen.