I'm trying to see if this is already part of Avro Tools or a better way to automatically create random data in Avro or a SpecificRecord (not GenericRecord) on a java generated class.
Here's an example:
I have a record generated by avro - let's call it SomeAvroGeneratedRecord
. I was able to create a static method that does something like this:
SomeAvroGeneratedRecord record = SomeHelper.generateRandomData(SomeAvroGeneratedRecord.class);
So I was able to get this working - but I feel like somehow there's a better way to code this. I'm reaching out to see if someone can improve my answer.
I've been working on creating dummy data for my avro classes. I went ahead and looked at AvroTools and found a holy grail: a wonderful class called org.apache.avro.util.RandomData
. It works great!
However, it seems to only output GenericData. What I'm looking for is to take a generated java class and just create all the new instances that I'd like. And I'd like to do it on-the-fly without having to write a line of code for every type I create.
The closest I found to this was a post here on StackOverflow: convert generic to specific record
I also found this random generator by confluent
However, the above code assumes you have the schema already.
This would allow me to take a random data generic record and convert it to a specific record - but this would still require me to have the schema. So I noticed that the schema is in the generated class as a field SCHEMA$
for all the generated classes. So I used some reflection to get that data:
public static <T extends SpecificRecordBase> T specificAvroRecordGenerator(Class<T> avroClassType) {
try {
Field field = avroClassType.getDeclaredField("SCHEMA$");
return specificAvroRecordGenerator((Schema)field.get(null));
} catch (IllegalAccessException | NoSuchFieldException e) {
throw new RuntimeException(e);
}
}
@SuppressWarnings("unchecked")
public static <T extends SpecificRecordBase> T specificAvroRecordGenerator(Schema schema) {
GenericRecord test =
(GenericRecord)new RandomData(schema, 1)
.iterator().next();
return (T) SpecificData.get().deepCopy(test.getSchema(), test);
}
Now, the code above DOES work. However, it feels sooo wonky. I'm surprised it works to be honest. I was able to test it and they're passing:
@Test
void testGenericAvroGenerator() {
assertInstanceOf(PipelineDocument.class, SampleAvroData.specificAvroRecordGenerator(PipelineDocument.class));
assertInstanceOf(ParsedArticle.class, SampleAvroData.specificAvroRecordGenerator(ParsedArticle.class));
}
@Test
void testGenericAvroGenerator() {
assertInstanceOf(PipelineDocument.class, SampleAvroData.specificAvroRecordGenerator(PipelineDocument.getClassSchema()));
assertInstanceOf(ParsedArticle.class, SampleAvroData.specificAvroRecordGenerator(ParsedArticle.getClassSchema()));
}
So I ask: is there a better way to do this? I'm using JDK 17/19 for now. So I'm willing to use any new classes available. I'm happy with the results, but just want to make sure I'm not coding some monstrous hack.
I tried the above but I'm using a lot of parts of reflection and the Avro generated classes that I'm not particularly used to. I feel like I can tighten it up a bit and make it a little more clear - or if there's another method that already does something similar to this.
Any advice would be appreciated.