0

I have an Avro schema file customer.avsc. I already successfully created the Avro object using builder, and I can read the avro object. I am wondering how to convert the customer avro object into Byte and store it in the database. Thanks a lot!

    public static void main(String[] args) {

        // we can now build a customer in a "safe" way
        Customer.Builder customerBuilder = Customer.newBuilder();
        customerBuilder.setAge(30);
        customerBuilder.setFirstName("Mark");
        customerBuilder.setLastName("Simpson");
        customerBuilder.setAutomatedEmail(true);
        customerBuilder.setHeight(180f);
        customerBuilder.setWeight(90f);

        Customer customer = customerBuilder.build();
        System.out.println(customer);
        System.out.println(111111);

        // write it out to a file
        final DatumWriter<Customer> datumWriter = new SpecificDatumWriter<>(Customer.class);

        try (DataFileWriter<Customer> dataFileWriter = new DataFileWriter<>(datumWriter)) {
            dataFileWriter.create(customer.getSchema(), new File("customer-specific.avro"));
            dataFileWriter.append(customer);
            System.out.println("successfully wrote customer-specific.avro");
        } catch (IOException e){
            e.printStackTrace();
        }
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Shan
  • 177
  • 3
  • 12
  • Have you found the Avro `BinaryEncoder` class? – OneCricketeer Feb 10 '21 at 19:34
  • Thanks for your reply, and I have successfully converted the Avro into Byte using BinaryEncoder. I am using Specific Record now, and I believe generating a Java class for the Avro schema is required. If I don't want to generate a Java class, do you know how to use the specific version / schema id for the consumer? I saw someone using Python to do that. https://stackoverflow.com/questions/60467878/how-to-programmatically-get-schema-from-confluent-schema-registry-in-python Do you know if there is a similar way for Java? Please correct me if I am wrong. Thanks a lot! – Shan Feb 10 '21 at 20:34
  • You can use the [Schema Registry Maven plugin](https://docs.confluent.io/platform/current/schema-registry/develop/maven-plugin.html#schema-registry-download) to download the latest version, then the [standard Avro Maven plugin](https://docs.confluent.io/platform/current/schema-registry/develop/maven-plugin.html#schema-registry-download) to generate a SpecificRecord subclass for that. I dont think the Maven plugin supports downloading specific versions, though, looking at the source code. You might find the Jackson Avro library useful, but that doesn't really integrate with the registry – OneCricketeer Feb 10 '21 at 22:54
  • The way we use Avro, we have the producers push versioned schemas+classes to a Maven repo **and** the Registry, which allows consumers to pull those just as regular Maven dependencies. But, if you are wanting to write data to MySQL and have the Confluent Schema Registry, you'd ideally be using Kafka Connect for this, which does not require specific classes or a custom consumer application – OneCricketeer Feb 10 '21 at 22:57
  • Thanks for your detailed explanation! The reason why I need to convert that into Byte is because I use Debezium connector to achieve outbox pattern, and one column in the outbox table has the nested format (which requires either json (converted into string)/avro (converted into byte)), and the mysql db is the source. Based on the information you provided, I think pulling the schema from registry as the maven dependencies to use it in the consumer project is the best idea for future schema evolution. – Shan Feb 11 '21 at 03:24
  • You should be able to use ByteArrayConverter with this transform to get connect to write bytes into a database column https://docs.confluent.io/platform/current/connect/transforms/hoistfield.html – OneCricketeer Feb 11 '21 at 14:38
  • Thank you so much! Giving me a lot of directions and solutions!! – Shan Feb 12 '21 at 17:08

1 Answers1

0

I am using BinaryEncoder to solve this problem. In this case, the avro could be converted into Byte and saved into the MySQL database. Then when receiving the data from kafka (byte -> MySQL -> Debezium Connector -> Kafka -> Consumer API), then I can just decode the payload of that byte column into avro / Java object again with the same schema. Here is the code.

        Customer.Builder customerBuilder = Customer.newBuilder();
        customerBuilder.setAge(20);
        customerBuilder.setFirstName("first");
        customerBuilder.setLastName("last");
        customerBuilder.setAutomatedEmail(true);
        customerBuilder.setHeight(180f);
        customerBuilder.setWeight(90f);

        Customer customer = customerBuilder.build();

        DatumWriter<SpecificRecord> writer = new SpecificDatumWriter<SpecificRecord>(
            customer.getSchema());
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);
        writer.write(customer, encoder);
        encoder.flush();
        out.close();

        byte[] serializedBytes = out.toByteArray();
        System.out.println("Sending message in bytes : " + serializedBytes);
//        //String serializedHex = Hex.encodeHexString(serializedBytes);
//        //System.out.println("Serialized Hex String : " + serializedHex);
//        KeyedMessage<String, byte[]> message = new KeyedMessage<String, byte[]>("page_views", serializedBytes);
//        producer.send(message);
//        producer.close();

        DatumReader<Customer> userDatumReader = new SpecificDatumReader<Customer>(Customer.class);

        Decoder decoder = DecoderFactory.get().binaryDecoder(serializedBytes, null);

        SpecificRecord datum = userDatumReader.read(null, decoder);
        System.out.println(datum);
Shan
  • 177
  • 3
  • 12