0

I have a Kafka instance running on an AWS EC2 machine and acting as a producer to my AWS MSK Cluster. For writing the data to S3 bucket I have created an AWS MSK Connector using below configuration:

connector.class=io.confluent.connect.s3.S3SinkConnector
partition.duration.ms=86400000
s3.region=us-east-1
topics.dir=prod/raw/bettopics
flush.size=20000
schema.compatibility=NONE
s3.part.size=5242880
tasks.max=1
timezone=UTC
topics=MSKTutorialTopic
locale=en-US
format.class=io.confluent.connect.s3.format.json.JsonFormat
partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner
storage.class=io.confluent.connect.s3.storage.S3Storage
path.format='dt'=YYYY-MM-dd
s3.bucket.name=pvs2-prod-msk
timestamp.extractor=Wallclock

I am able to generate files in the S3 bucket using DefaultPartitioner Partitioner class with below configuration, but unable to generate any files using the above configuration:

connector.class=io.confluent.connect.s3.S3SinkConnector
s3.region=us-east-1
schema.compatibility=NONE
tasks.max=1
topics=MSKTutorialTopic
format.class=io.confluent.connect.s3.format.json.JsonFormat
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
storage.class=io.confluent.connect.s3.storage.S3Storage
s3.bucket.name=pvs2-prod-msk

Am I missing any MSK Connector's configuration details, or putting in any incorrect details? Any help would be much appreciated, thanks!

  • Figured it out. I thought `flush.size` corresponded to the maximum number of records that a file could contain in a batch write, but if the batch write contains fewer number of records than the defined `flush.size` then the connector would wait for the next batch write so that record count goes above the defined `flush.size` and then the file would be produced on S3. Hence, I decreased the `flush.size` to be 10 as I was producing the record manually on an EC2 instance, and the minute I created 10 records my file got produced on S3 – Prashant Vikram Singh Sep 13 '22 at 18:05

0 Answers0