0

Hello I am currently trying to send data from kafka(MSK)(AWS) to elasticsearch via confluent connector. I have a stream of data coming in from a sql database. When I run the confluent connector it runs for a while sending data to elasticsearch but stops after a minute or 2 with the error below.

sending LeaveGroup request to coordinator b-2.*****.amazonaws.com:9092 due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

There is a lot of data to stream through and wondering if there is a better way or specific settings to send multiple batches at once or a way to fix this error that may help. Is my elasticsearch server to small?

Any other information please let me know thank you.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
bob Ditusa
  • 21
  • 1
  • 5

1 Answers1

0

As the error message suggests it seems that writing messages to Elasticsearch (I assume this is a sink connector) is taking too much time and therefore -- the consumer client which in this case is the connector is being forced to leave the consumer group because of timeouts. To isolate this you should increase the value of the property max.poll.interval.ms and see if the problem persists. If so, it may mean that indexing in Elasticsearch is taking too long which could indicate that your Elasticsearch cluster is rather too busy or undersized.

  • Alternatively, you can verify the indices.indexing.index_time_in_millis metric on Elasticsearch to double-check if the indexing time is indeed taking too long.

It is also important to check the health of your Kafka Connect cluster. If for some reason it is unstable, the tasks from the Elasticsearch connector will be re-balanced, which also forces the consumer client to leave the consumer group.

It is a game of eliminating options until you find the root-cause of the problem ¯_(ツ)_/¯

Ricardo Ferreira
  • 1,236
  • 3
  • 6
  • Great. Thank you I am pretty new to the kafka elasticsearch. Where can I change this property `max.poll.interval.ms`? Is there a config file that needs to be changed? – bob Ditusa Feb 03 '21 at 16:55
  • You can override the worker configuration of Kafka Connect to set specific properties for producers and consumers. Here is a link that describes how to do this: https://docs.confluent.io/platform/current/connect/references/allconfigs.html#override-the-worker-configuration – Ricardo Ferreira Feb 03 '21 at 20:16
  • I will take a look. It actually looks like the ES was to small. I increased the size and it does not seem to cause that error anymore. Seems like it is super slow to push data from the consumer to ES. Any idea if there is a way to make it go faster. Is there something in the elasticsearch config that can help with batching more at a time. – bob Ditusa Feb 04 '21 at 02:07
  • I think the best to do regarding allowing ES to increase throughput while handling requests is sending API calls in batches, which AFAIK is what the connector does behind the scenes as it uses the Bulk API from ES. Would help to check if batching is enabled in the connector. Other than this, run a load into ES to see if it is undersized. If it is given the load you're putting the system under then there is nothing you can do. – Ricardo Ferreira Feb 04 '21 at 13:59
  • Thank you. I found out how to reduce the size of the batches for the connector. I am going to try that out see if that helps. – bob Ditusa Feb 05 '21 at 14:59