1

I would like to use kafka connect s3 sink connector to stream data our of a topic to s3 bucket. The data inside the topic will be xml messages. As per connector config, we can define the format of the message(for example: JsonFormat)

As per the confluent docs, it looks like we can define custom format by implementing
io.confluent.connect.storage.format.Format.

I was going through the available format code like JsonFormat, it looks like actual logic of format is in JsonRecordWriterProvider which is implementation of io.confluent.connect.storage.format.RecordWriterProvider

I see that RecordWriter write implementation which is applying JsonConvertor convert method over SinkRecord.value().

How can we know what does SinkRecord contains and can we just write a xmlconvertor and convert the SinkRecord.value() to a dom object etc ?

Any references that I could you use to implement one ?

I was going through the code provided by confluent.

https://github.com/confluentinc/kafka-connect-storage-cloud/tree/master/kafka-connect-s3/src/main/java/io/confluent/connect/s3/format

VSK
  • 359
  • 2
  • 5
  • 20
  • You would need to learn to convert the `SinkRecord.valueSchema` and optionally `SinkRecord.keySchema` from a [`Schema`](https://kafka.apache.org/21/javadoc/org/apache/kafka/connect/data/Schema.html) object into an XSD record, then populate a XML record from the XSD – OneCricketeer Oct 28 '19 at 19:07
  • does it have to be one XSD for all the xml messages in kafka topic ? – VSK Oct 28 '19 at 19:35
  • The schema could change over time... at least avro and json can – OneCricketeer Oct 28 '19 at 19:36
  • I mean, I dont care about the schema...all I need is converting string or byte[] in kafka topic to dom object which I want to put it in s3 file by file. Is there a way to do it without defining schema ? – VSK Oct 28 '19 at 20:32
  • Kafka Connect relies on Schemas, AFAIK. But you can check if the key or value schema is null... But then you would not know if the key or values are strings, bytes, int, boolean, etc. – OneCricketeer Oct 28 '19 at 21:00
  • Here is what I am looking for.... every xml message that will be in kafka topic will have ......... common root tag and there will be type tag and rest can be different inside root tag. All I need in the s3 sink connector is value between tags which I want to use it as folder name in s3 bucket using custom partitioner. Can I do this as StringConverter that Kafka provides out of box ?I will just write custom partitioner class if there is a way to get the value – VSK Oct 29 '19 at 13:15
  • Sounds like you want a Transform, not just a Partitioner. Partitioner only works on structured objects (those with schemas) – OneCricketeer Oct 29 '19 at 13:23
  • Lets say if I put the type value as key to the message when I put it in kafka, can I use partitioner to define the destination location in s3 as I desired ? – VSK Oct 29 '19 at 13:33
  • Or can you tell me how to achieve this using Transformation ? – VSK Oct 29 '19 at 13:43
  • I believe partitioners only work on fields in the value. And again, partitioners cannot work with the StringConverter. You would have to write your own `Transformation` implementation that knows how to parse XML and return a Structured record to consume. [The source code of existing ones is here](https://github.com/apache/kafka/tree/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms), but I have no reference on how you would actually write your own and deploy it, other than [one I created myself](https://github.com/cricket007/schema-registry-transfer-smt#installation) – OneCricketeer Oct 29 '19 at 16:12

0 Answers0