1

I am doing poc of confluent kafka connect version 5.2.3. We are trying to copy message of topic a file as backup and from this file back to topic when we need it.

Topic has Key =string Value=protbuf

I am using

key.convertor=org.apache.kafka.connect.storgare.StringConvertor value.convertor=com.blueapron.connect.protobuf.ProtobufConvertor value.convertor.protoClassName=<proto class name>

Sink config

name=test
connector.class=FileStreamSink
tasks.max=1
file=test.txt
topics=testtopic

Source config

name=test
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topics=testtopic_connect

I am able successfully sink it to a file with file content as below

Struct{<message in name value pair>}
Struct{<message in name value pair>}

....

The same file i use to source it back to a different topic. When i run the source it throw error

String cannot be casted to org.apache.kafka.connect.data.Struct.

Questions are

  • Why i do not see any key in the file when my kafka topic has key value pair.
  • Why source is not able to copy the content from file to topic and throwing casting related error.
  • I get the similar error when i use ByteArrayConvertor provided by kafka. String cannot be casted to bytes. Ideally ByteArrayConvertor should work for any kind of data.
  • Does blueapron only works with protobuf3 version?
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Abhishek
  • 519
  • 1
  • 6
  • 24

2 Answers2

0

Why i do not see any key in the file when my kafka topic has key value pair

The FileSink doesn't write the key, only the value, by doing a .toString of the Struct or non-Struct of that data. There is an SMT to get the key moved over into the value - https://github.com/jcustenborder/kafka-connect-transform-archive

But the rows of the file would still look like Struct{key= ..., value=...}

Why source is not able to copy the content from file to topic and throwing casting related error.

I get the similar error when i use ByteArrayConvertor provided by kafka. String cannot be casted to bytes. Ideally ByteArrayConvertor should work for any kind of data.

The FileSource only reads line-delimited fields from the file as strings.

Does blueapron only works with protobuf3 version?

It seems like it - https://github.com/blueapron/kafka-connect-protobuf-converter/blob/v2.2.1/pom.xml#L21


Essentially, if you want to fully backup a topic (including timestamps, headers, and possibly even offsets), you'll need something that dumps and reads actual binary data (which is not what ByteArrayConverter does, that only (de)serializes the data from/to Kafka, not the source/sink, and the FileSource connector will try to read the file back always as a string)

Community
  • 1
  • 1
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I want to copy the messages (key, value) the source topic into some file as a backup, so that i can recover those messages when i need. I dont need timestamp, header, offset info but key and value. Other questions is why does source fails with that error its not even copying values – Abhishek Oct 15 '19 at 02:32
  • I understand what you're trying to do, but FileSink connector isn't meant for that (other than data that's already a string) . The error is because `Struct{}` String format cannot be parsed back into Protobuf or bytes – OneCricketeer Oct 15 '19 at 04:19
  • Can i not even do that with bytearrayconvertor or any other conertor. So you mean if i have protobuf message(key, value) in topic i can only sink it to some file but cannot file source to dump it back to some other topic? What is the recommended way if not kafka connect. Is there any better way than writing java consumer and producer – Abhishek Oct 15 '19 at 04:37
  • FileSource always sends data back to Kafka as a String - [source](https://github.com/apache/kafka/blob/trunk/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceTask.java#L146-L154), so no. You need to use a different Connector entirely, similar to the one mentioned here - https://jobs.zalando.com/tech/blog/backing-up-kafka-zookeeper/ And you can use a tool like MinIO to get "local s3" buckets where files are written to. – OneCricketeer Oct 15 '19 at 14:44
  • We are not using aws, and there are some limitations to use 3rd party code. Can i use mirrormake or replicator to copy the content in the same cluster? – Abhishek Oct 16 '19 at 09:50
  • MirrorMaker to the same cluster would create a replication loop unless you are able to rename the topic. Replicator would work better in that regard, though duplicating the topic within the same cluster isn't exactly a backup if you're trying to protect against the cluster failing... Besides that, Zookeeper stores the actual topic configurations, so you'd have to back that up as well – OneCricketeer Oct 16 '19 at 13:42
  • We only need data, by the way i tried kafkacat also to produce file in a topic that also has failed with invalid message. Kafkacat atleast dumps keys as well. But not sure why it was throwing invalid message error – Abhishek Oct 17 '19 at 04:13
  • Kafkacat consumer also only accepts string data, as far as I know, so Protobuf still wouldn't work – OneCricketeer Oct 17 '19 at 04:29
  • Kafkacat was successfully able to write in binary, ita failing only when using it as producer – Abhishek Oct 17 '19 at 05:06
  • That's what I mean. When consuming a file or typed text (to produce to Kafka), it reads that data as text, not binary – OneCricketeer Oct 17 '19 at 16:27
0

Does blueapron only works with protobuf3 version?

Yes, since we only use proto3 internally (we needed Python, Java and Ruby code generation - Ruby is only available for proto3), unfortunately we didn't build in support for proto2 definitions when first building this tool. I've updated the project readme to reflect that (https://github.com/blueapron/kafka-connect-protobuf-converter#compatibility).