How to produce kafka topic using message batch or buffer with pykafka. I mean one producer can produce many message in one produce process. i know the concept using message batch or buffer message but i dont know how to implement it. I hope someone can help me here
2 Answers
PyKafka handles message batching in the producer transparently - you don't have to do anything special to make sure messages are produced in batches. The Producer
class offers a bunch of configuration options to let you customize the batching behavior. The full list of these options is available in the documentation, but a few of the most important ones are:
max_queued_messages
- when you'veproduce()
d more messages than this, send the batch immediatelymin_queued_messages
- when you'veproduce()
d at least this many messages, send the batchlinger_ms
- when this much time has passed since the last batch, send the batch

- 5,969
- 2
- 29
- 47
Just use the send()
method. You do not need to manage it by yourself.
send() is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.
Your task is only that configure two props about this: batch_size and linger_ms.
The producer maintains buffers of unsent records for each partition. These buffers are of a size specified by the ‘batch_size’ config. Making this larger can result in more batching, but requires more memory (since we will generally have one of these buffers for each active partition).
The two props will be done by the way below:
once we get batch_size worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will ‘linger’ for the specified time waiting for more records to show up.

- 4,447
- 21
- 27
-
This answer applies to kafka-python, not pykafka as the OP specified. https://github.com/dpkp/kafka-python/blob/0c78f704520a42d0935cb87298dd69f8e4af5894/kafka/producer/kafka.py#L53 – Emmett Butler Sep 14 '17 at 18:54