1

I'm trying to see if this is already part of Avro Tools or a better way to automatically create random data in Avro or a SpecificRecord (not GenericRecord) on a java generated class.

Here's an example:

I have a record generated by avro - let's call it SomeAvroGeneratedRecord. I was able to create a static method that does something like this:

SomeAvroGeneratedRecord record = SomeHelper.generateRandomData(SomeAvroGeneratedRecord.class);

So I was able to get this working - but I feel like somehow there's a better way to code this. I'm reaching out to see if someone can improve my answer.

I've been working on creating dummy data for my avro classes. I went ahead and looked at AvroTools and found a holy grail: a wonderful class called org.apache.avro.util.RandomData. It works great!

However, it seems to only output GenericData. What I'm looking for is to take a generated java class and just create all the new instances that I'd like. And I'd like to do it on-the-fly without having to write a line of code for every type I create.

The closest I found to this was a post here on StackOverflow: convert generic to specific record

I also found this random generator by confluent

However, the above code assumes you have the schema already.

This would allow me to take a random data generic record and convert it to a specific record - but this would still require me to have the schema. So I noticed that the schema is in the generated class as a field SCHEMA$ for all the generated classes. So I used some reflection to get that data:

public static <T extends SpecificRecordBase> T specificAvroRecordGenerator(Class<T> avroClassType) {
    try {
        Field field = avroClassType.getDeclaredField("SCHEMA$");
        return specificAvroRecordGenerator((Schema)field.get(null));
    } catch (IllegalAccessException | NoSuchFieldException e) {
        throw new RuntimeException(e);
    }
}

@SuppressWarnings("unchecked")
public static <T extends SpecificRecordBase> T specificAvroRecordGenerator(Schema schema) {
        GenericRecord test =
                (GenericRecord)new RandomData(schema, 1)
                        .iterator().next();
        return (T) SpecificData.get().deepCopy(test.getSchema(), test);
}

Now, the code above DOES work. However, it feels sooo wonky. I'm surprised it works to be honest. I was able to test it and they're passing:

    @Test
    void testGenericAvroGenerator() {
        assertInstanceOf(PipelineDocument.class, SampleAvroData.specificAvroRecordGenerator(PipelineDocument.class));
        assertInstanceOf(ParsedArticle.class, SampleAvroData.specificAvroRecordGenerator(ParsedArticle.class));
    }

    @Test
    void testGenericAvroGenerator() {
        assertInstanceOf(PipelineDocument.class, SampleAvroData.specificAvroRecordGenerator(PipelineDocument.getClassSchema()));
        assertInstanceOf(ParsedArticle.class, SampleAvroData.specificAvroRecordGenerator(ParsedArticle.getClassSchema()));
    }

So I ask: is there a better way to do this? I'm using JDK 17/19 for now. So I'm willing to use any new classes available. I'm happy with the results, but just want to make sure I'm not coding some monstrous hack.

I tried the above but I'm using a lot of parts of reflection and the Avro generated classes that I'm not particularly used to. I feel like I can tighten it up a bit and make it a little more clear - or if there's another method that already does something similar to this.

Any advice would be appreciated.

0 Answers0