0

I made a Kafka producer using kafka-python to send records to a remote broker. If I have a problem of network connection during more than request_timeout_ms (here 20s), the callback sends me this exception :

KafkaTimeoutError: Batch for TopicPartition(topic='first_topic', partition=0) containing 117 record(s) expired: 20 seconds have passed since last append

I don't want to have a too high value of timeout and would like to write data expired into a local storage. The objective after that will be to make a producer consuming records into this storage (if not empty) and send them to the remote if/when it can. The records are critical that's why I would like to store it instead of deleting it. No matter if the order is respected. So how can we tell the producer not to delete expired records and write it somewhere before deleting them ?

Thanks for your help

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
noam
  • 1
  • 2
  • Even if you stored them locally, you'd still need to create a producer and that would still drop messages under similar conditions... How do you plan on detecting "when it can"? – OneCricketeer Nov 04 '21 at 16:01
  • Depending on the data, you could _try_ writing data to SQLite, then use the Kafka Connect JDBC connector to consume those (durable) records. – OneCricketeer Nov 04 '21 at 16:02
  • 1
    Its hard to say what a solution is without knowing more about the problem. – Fermi-4 Nov 04 '21 at 16:10
  • May be dlq you can use. – Vaibs Nov 04 '21 at 17:44
  • @OneCricketeer my plan is to write data expired on disk just before deleting them. For now, it just deletes data expired and logs it as an error but I can not do anything before it happens. When the data is written in a folder, I just need to have a producer which, every day for example, reads this data and send it to the consumer. SQLite may be a good solution but I can't see where/how tell to my Kafka producer to do something with expired data instead of deleting them directly – noam Nov 08 '21 at 08:08
  • @Vaibs Can it work with expired data ? And will my producer save this data into this DLQ then delete them from my first batch ? – noam Nov 08 '21 at 08:12
  • I don't understand what you're saying about expired data because that's tangential to batching the data beforehand (obviously you'll need to consider local storage space, and you're ignoring the fact that a producer batches data on its own anyway). You can store timestamps or boolean `was_produced` columns as part of the database row along with BLOB types for the key+value. Then you have some thread that would periodically grab several rows, produce it, then clean the database table. – OneCricketeer Nov 08 '21 at 08:21
  • @OneCricketeer I see what you mean and the solution may work but the main objective was to avoid storing too many data because it could lead us to have besides our producer a too big storage space (~500 millions of messages every day). So if I can tell my producer to save these 117 records (in my example on top) only, not the other ones. It may not be as easy as I thought. – noam Nov 09 '21 at 15:47
  • Not sure if this helps, but my initial thought from your description is similar to the ["leaf buffers" described by Yelp's Kafka architecture](https://engineeringblog.yelp.com/2020/01/streams-and-monk-how-yelp-approaches-kafka-in-2020.html) – OneCricketeer Nov 09 '21 at 15:50

0 Answers0