10

I'm processing xml's and I need to send a message per record, when I receive the last record I close the kafka producer, the problem here is that the send method of the kafka producer is async, therefore, sometimes when I close the producer it trows java.lang.IllegalStateException: Cannot send after the producer is closed. I've read somewhere that I can leave the producer open. My question is: What does it imply, or if there is a better solution for this.

---Edit---

<list>
  <element attr1="" att2="" attr3=""/>
  <element attr1="" att2="" attr3=""/>
  <element attr1="" att2="" attr3=""/>
  <element attr1="" att2="" attr3=""/>
  <element attr1="" att2="" attr3=""/>
  <element attr1="" att2="" attr3=""/>
  <element attr1="" att2="" attr3=""/>
  <element attr1="" att2="" attr3=""/>
...
</list>

Imagine the following scenario:

  • We read the tag and we create the kafka producer
  • Per each element we read its attributes, generate a json object and send it to kafka using the send method. -When we read the element we call the close method in the producer

Problem the number of elements can be 80k therefore, sometimes when we call the disconnect method it continues sending the messages in an async way. So we need to call the flush method first but it impacts the performance

törzsmókus
  • 1,799
  • 2
  • 21
  • 28

1 Answers1

10

You should call Producer.flush() before calling Producer.close(). This is a blocking call and will return not before all record got sent.

If you don't call close(), depending on the implementation/language you might end up with resource/memory leaks.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • I tried to do this but the performance now is terrible, in the last version of kafka there was a method send (List) is there some way to do it with this version ? – Alejandro Agapito Bautista Mar 30 '17 at 14:24
  • Not sure if I can follow... You say "when I receive the last record I close the kafka producer" -- thus, there should be only one call to .flush() and one call to .close() -- how can this impact your writing performance? – Matthias J. Sax Mar 30 '17 at 17:23
  • I'm processing an xml, I'm using sax, When I detect an special tag I'm calling the flush and close method. But If i don't do it the performance is so much better but after a certain period of time I receive an outofmemoryerror. In the previous version of kafka there was a method for send a list of messages but now it is not avilable . So I'm thinking how to do it in the better way – Alejandro Agapito Bautista Mar 30 '17 at 17:50
  • The Producer buffers records internally and send them in batches. Thus, there is no need for "sending a list" because it's handled by the producer automatically. Can you describe the overall pattern you apply? When do you create a producer? when do you send? when do you call close? I am still not sure if I understand what you are doing exactly... Also check out the docs: http://docs.confluent.io/current/clients/producer.html – Matthias J. Sax Mar 30 '17 at 20:34
  • You can use a single producer to process the whole XML -- no need to create a producer per tag. – Matthias J. Sax Mar 30 '17 at 22:14
  • Yes I'm creating one producer per file, and I'm calling the send method per record and when I receive the that is the last one I close the producer – Alejandro Agapito Bautista Mar 30 '17 at 22:28
  • Sound about right. No idea why your performance goes down... do you process multiple XML files? You can user a single producer for all of them, too. – Matthias J. Sax Mar 30 '17 at 22:29
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/139538/discussion-between-alejandro-agapito-bautista-and-matthias-j-sax). – Alejandro Agapito Bautista Mar 30 '17 at 22:41
  • Even if i flush and close the producer, I receive Cannot send after the producer is closed. – Alejandro Agapito Bautista Mar 31 '17 at 00:31
  • If course -- you are closing the producer when you don't need it anymore... – Matthias J. Sax Apr 05 '17 at 18:15
  • can you tell me please if i'm on the right path ? I create a kafka producer on the init of the app and use it on every request to send records to kafka brokers, is it safe or should i use it ==> flush/close it ==> recreate new one while requesting endpoint. – Chawki May 19 '23 at 11:43
  • Yes, you can create a producer just during init and use it as long as your application is running. Having long living producers is a very common pattern, and there is no need to close it and re-create a new one. – Matthias J. Sax May 19 '23 at 12:02