5

I had my Kafka Connectors paused and upon restarting them got these errors in my logs

[2020-02-19 19:36:00,219] ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
************
************
[2020-02-19 19:36:00,216] ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to flush, timed out while waiting for producer to flush outstanding 2389 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)

I got this error multiple times with the number of outstanding messages changed. Then it stopped and haven't seen it again.

Do I need to take any action here or has Connect retried and committed the offsets and that is why the error has stopped?

Thanks

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
AnonymousAlias
  • 1,149
  • 2
  • 27
  • 68

1 Answers1

11

The error indicates that there are a lot of messages buffered and cannot be flushed before the timeout is reached. To address this issue you can

  • either increase offset.flush.timeout.ms configuration parameter in your Kafka Connect Worker Configs
  • or you can reduce the amount of data being buffered by decreasing producer.buffer.memory in your Kafka Connect Worker Configs. This turns to be the best option when you have fairly large messages.
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
  • So this might of happened because the connectors were paused then there would of been a lot of data coming through? – AnonymousAlias Feb 20 '20 at 09:58
  • 1
    @AnonymousAlias This might be the case. – Giorgos Myrianthous Feb 20 '20 at 09:59
  • 2
    Do you know if it would of retried and then flushed and committed or will there be a loss of data there and need to adjust settings and restart? – AnonymousAlias Feb 20 '20 at 09:59
  • Have you checked your connector's status? I assume that it is already in `FAILED` stage. – Giorgos Myrianthous Feb 20 '20 at 10:04
  • The connectors all seem fine they are in "Running" and connect service on my instance is green and active – AnonymousAlias Feb 20 '20 at 10:07
  • @AnonymousAlias I can't guarantee that no message was lost. I think it's better to run a couple of tests and make sure that there is nothing missing from your target. – Giorgos Myrianthous Feb 20 '20 at 10:08
  • Just very hard to do as the connectors are monitoring changes from a CDC on a postgres database so there is a massive amount of files, I asked the team handling the data to check but was just curious. I will accept your answer anyway. If you had to take a guess about data loss what would you say? – AnonymousAlias Feb 20 '20 at 10:10
  • 1
    If it has failed to commit offsets, then it means that the messages with uncommitted offset should normally be committed afterwards. So it might be the case that you haven't lost any messages but as I mentioned I wouldn't be certain about that. – Giorgos Myrianthous Feb 20 '20 at 10:14
  • By the way, in the worst case scenario I would suggest you to replay the messages so that you can be certain that your target system is up to date. – Giorgos Myrianthous Feb 20 '20 at 10:23