1

I am writing a Kafka connector in order to download some data from several sources on Github (text and yaml files) and transform them into objects of a certain class, which is automatically generated from an avsc-file:

{
  "type": "record",
  "name": "MatomoRecord",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "type", "type": "string"},
    {"name": "timestamp", "type": "long"}
  ]
}

So far everything was successful. So now I have a Map of objects, which I want to persist in a Kafka topic. For that I'm trying to create SourceRecords:

for (Map.Entry<String, MatomoRecord> record : records.entrySet()) {
  sourceRecords.add(new SourceRecord(
    sourcePartition,
    sourceOffset,
    matomoTopic,
    0,
    org.apache.kafka.connect.data.Schema.STRING_SCHEMA,
    record.getKey(),
    matomoSchema,
    record.getValue())
  );
}

How can I define the value schema of type org.apache.kafka.connect.data.Schema based on the avro schema? For a test I have manually created a schema using the Builder:

Schema matomoSchema = SchemaBuilder.struct()
                .name("MatomoRecord")
                .field("name", Schema.STRING_SCHEMA)
                .field("type", Schema.STRING_SCHEMA)
                .field("timestamp", Schema.INT64_SCHEMA)
                .build();

The result was:

org.apache.kafka.connect.errors.DataException: Invalid type for STRUCT: class MatomoRecord

Could sombody help me to define the value schema based on the avro schema?

Best regards Martin

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
biomartin
  • 87
  • 11

2 Answers2

1

You can't use record.getValue(), nor is there a direct API from Avro to Connect Schema (without internal methods of Confluent's AvroConverter)

You need to parse that object into a Struct object that matches the schema you've defined (which looks fine assuming none of your object fields can be null)

Look at the Javadoc for how you can define it https://kafka.apache.org/22/javadoc/org/apache/kafka/connect/data/Struct.html

Note (not relevant here), nested structs should be built from the "bottom up", where you put child structs / arrays into parent ones.

Your connector should not necessarily depend on Avro other than to include your model objects. The Converter interfaces are responsible for converting your Struct with its Schema into other data formats (JSON, Confluent's Avro encoding, Protobuf, etc)

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
0

A KC Schema is an JSON schema that looks awfully like an Avro schema. Try org.apache.kafka.connect.json.JsonConverter#asConnectSchema - you may need to massage the Avro schema to make it work.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Liam Clarke
  • 375
  • 1
  • 6
  • Thank you for your answer, Liam. Yes, it is an Avro schema. Could you please explain your idea with `org.apache.kafka.connect.json.Schema#asConnectSchema`? – biomartin Jun 04 '19 at 11:22
  • Is that method correct? Do you have example usage? It wouldn't take an Avro schema because Kafka core connect api has no Avro dependencies – OneCricketeer Jun 04 '19 at 12:30
  • It takes a JSON representation of a schema. – Liam Clarke Jun 04 '19 at 22:05
  • Again, do you have example usage? I am geninely interested in how that method works... Besides, that `Schema` class doesn't even exist, so I don't know what you're referring to https://github.com/apache/kafka/tree/trunk/connect/json/src/main/java/org/apache/kafka/connect/json – OneCricketeer Jun 05 '19 at 15:30
  • Sorry @cricket_007 got the wrong class somehow, I blame a lack of coffee, it's actually on the `JsonConverter`. Main differences between the Avro and Connect schema JSON representations: `record` is `struct`, `name` is `field`, `int` is `int32`, `long` is `int64` , The above goes for float/double etc. Connect can't handle a `type` field that isn't a string, so no support for Avro enum types, or Avro's representation of nullable types etc. But those are the only things I had to change. Applied to the OP's schema: https://gist.github.com/LiamClarkeNZ/177b83d1fa10db72a01c8295aa0cae23 – Liam Clarke Jun 05 '19 at 23:05
  • 1
    Hmm.. I would opt to do that programmatically just given the Avro schema rather than do string replacements - https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroData.java#L1167-L1175 – OneCricketeer Jun 06 '19 at 01:40
  • That looks way better, I thought there must be something along those lines given how much Confluent pushes Avro. – Liam Clarke Jun 06 '19 at 01:53