1

I have a Kafka Connect S3 Sink writing records to Amazon S3. This particular sink is writing about 4k rec/sec. Every few days, one of the Kafka Connect worker tasks fails with the following error. A manual restart completely fixes the issue until it happens again a few days later.

I also increased "s3.part.retries" to "10" from its default of "3" but that seemed to have no effect.

Is there any other solution to this?

I'm running Confluent 5.0.1 with Kafka 2.0.1. I don't see any relevant changes in the latest and greatest Confluent 5.1.2 (https://docs.confluent.io/current/connect/kafka-connect-s3/changelog.html)

"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Multipart upload failed to complete.
    at io.confluent.connect.s3.storage.S3OutputStream.commit(S3OutputStream.java:160)
    at io.confluent.connect.s3.format.avro.AvroRecordWriterProvider$1.commit(AvroRecordWriterProvider.java:97)
    at io.confluent.connect.s3.TopicPartitionWriter.commitFile(TopicPartitionWriter.java:505)
    at io.confluent.connect.s3.TopicPartitionWriter.commitFiles(TopicPartitionWriter.java:485)
    at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:223)
    at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:176)
    at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:195)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564)
    ... 10 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: We encountered an internal error. Please try again. (Service: null; Status Code: 0; Error Code: InternalError; Request ID: 08C85ACBE86D55E1), S3 Extended Request ID: 04dLa3n9JpDNKdGesZc9jrNg1Jstx5mwMB6fFEm+7ZpkFz+ivkn4IN7AlRsq894+YuaLbc2BHuM=
    at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CompleteMultipartUploadHandler.doEndElement(XmlResponsesSaxParser.java:1773)
    at com.amazonaws.services.s3.model.transform.AbstractHandler.endElement(AbstractHandler.java:52)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:142)
    at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseCompleteMultipartUploadResponse(XmlResponsesSaxParser.java:462)
    at com.amazonaws.services.s3.model.transform.Unmarshallers$CompleteMultipartUploadResultUnmarshaller.unmarshall(Unmarshallers.java:230)
    at com.amazonaws.services.s3.model.transform.Unmarshallers$CompleteMultipartUploadResultUnmarshaller.unmarshall(Unmarshallers.java:227)
    at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
    at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:44)
    at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:30)
    at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1501)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
    at com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:2933)
    at io.confluent.connect.s3.storage.S3OutputStream$MultipartUpload.complete(S3OutputStream.java:246)
    at io.confluent.connect.s3.storage.S3OutputStream.commit(S3OutputStream.java:156)
    ... 17 more
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
clay
  • 18,138
  • 28
  • 107
  • 192
  • Probably better to reach out to AWS support and give them those request IDs – OneCricketeer Mar 06 '19 at 19:29
  • Have you solved your problem ? I am facing the same – Yassir S Jun 29 '20 at 09:09
  • My solution was to setup monitoring + alerting. This error happens every month or so, it triggers an alert, someone needs to manually restart the connector. It does seem to happen less often more recently, so that's tolerable. – clay Jul 02 '20 at 21:43

0 Answers0