Questions tagged [s3-kafka-connector]

Apache Kafka Connector questions related to S3 source/sink plugins

This tag is ambiguous as there are competing connector.class properties that can be used to establish connection between Kafka and S3. Examples include classes written by Confluent or .

63 questions
3
votes
1 answer

Confluent Kafka S3 sink connector throws `java.lang.NoClassDefFoundError: com/google/common/base/Preconditions` when using Parquet format

When using Confluent S3 sink connector, the following error happens: [2021-08-08 02:25:15,588] ERROR WorkerSinkTask{id=s3-test-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover unt il manually…
dz902
  • 4,782
  • 38
  • 41
3
votes
1 answer

Is there a way in the S3 Kafka sink connector to ensure all records are consumed

I have a problem in the S3 Kafka connector but also seen this in the JDBC connector. I'm trying to see how can I ensure that my connectors are actually consuming all the data in a certain topic. I expect because of the flush sizes that there could…
Miguel Costa
  • 627
  • 1
  • 12
  • 30
2
votes
1 answer

kafka connect s3 source not working with Minio

I have verified the connection to minio, making sure that the credentials are working fine and minio is reachable. Also if I try any other value for store.url = http://minio:9000 I am not able to save the config, so I guess that there is no issue in…
2
votes
1 answer

How to extract nested field from Envelop type schema in s3 sink connector

Avro schema : { "type": "record", "name": "Envelope", "namespace": "test", "fields": [ { "name": "before", "type": [ "null", { "type": "record", "name": "Value", "fields": [ …
2
votes
1 answer

Parsing issues in Confluent S3 sink connector [serialization error]

I am using confluent s3 sink connector with confluent kafka code for base kafka-connect(v5.2.1). Originally, MySQL cdc is written as JSON(using maxwell) into a kafka topic (no schema written). This kafka connector reads data from the above apache…
uzumas
  • 632
  • 1
  • 8
  • 23
2
votes
1 answer

Process messages with Null inside array in Kafka connect S3 connector

I'm using Kafka connect with 2 connectors: debezium to pull data from Postgres to Kafka S3 connector to save data from Kafka to S3 While running I got this error from the S3 connector java.lang.NullPointerException: Array contains a null element…
shlomiLan
  • 659
  • 1
  • 9
  • 33
2
votes
1 answer

Spark unable to read DECIMAL columns in Parquet files written by AvroParquetWriter

I have some Parquet files written using AvroParquetWriter (from Kafka Connect S3 connector). One of the columns in the file aseg_lat has a schema DECIMAL(9, 7). I can read that column perfectly fine using both PyArrow and PrestoSQL. Trying to read…
1
vote
0 answers

Is there a way to reduce lag in a Kafka Consumer group?

my team and I are facing some problems in consuming lag in specific Kafka topics. We have a s3-sink connector configured like this: apiVersion: kafka.strimzi.io/v1beta2 kind: KafkaConnector metadata: name: dataplatform-s3-sink-connector-1 …
1
vote
1 answer

Reading files from S3 to kafka topic

I have a situation wherein all the event data is getting stored in an s3 bucket and I need to fetch that from S3 to Kafka topic on ec2. I am using CamelAWSS3Connector and am facing issues of the connector not working. Following is the error I am…
1
vote
1 answer

minio folder under a bucket is not automatically created by kafka s3 sink connector distributed mode

I've set up kafka s3 sink connector in both standalone mode and distributed mode. They both worked fine, however I observe a difference in the behavior on minio storage side. I specified topics.dir in sink properties to store the parquet files…
1
vote
1 answer

Can I use single s3-sink connector to point same field name for the timeStamp field by using different type of Avro Schema's for different topics?

schema for topic t1 { "type": "record", "name": "Envelope", "namespace": "t1", "fields": [ { "name": "before", "type": [ "null", { "type": "record", "name": "Value", "fields": [ …
1
vote
1 answer

Message loss/missing from same topic, read with different consumer group in Kafka

I have been encountering a weird issue with Kafka and Confluent Sink Connector which I am using in my setup. I have a system where in I have two kafka connect sink working on same topic of Kafka. I have S3 Connect Sink and Elastic Sink both are…
Ashit_Kumar
  • 601
  • 2
  • 10
  • 28
1
vote
2 answers

Is there a way to update connector configuration using MSK-connect API?

I have an s3 connector deployed on MSK Connect, and a repository on github with the json connector configuration file. I'd like to update the connectors configuration on demand via MSK's REST API. I've checked the API documentation, but it seems…
omer
  • 1,242
  • 4
  • 18
  • 45
1
vote
1 answer

how does the confluent s3 source connector know which files it has already ingested and which ones are new?

https://docs.confluent.io/kafka-connect-s3-source/current/ I think this connector polls s3 for a list of files -- but does it keep state about which ones it has processed and which ones are new? If it does store state, where is the state stored?
1
vote
0 answers

Reading Kafka topic and writing to s3 using kafka connector

I have the following dataframe schema root |-- sentence: string (nullable = true) |-- category: string (nullable = true) I am writing the above dataframe successfully into a kafka topic(s3.topic). Next I want to read s3.topic from kafka and store…
tharindu
  • 513
  • 6
  • 26
1
2 3 4 5