I am attempting to generate an Avro schema from java to describe a table that I can access via JDBC.
I use the JDBC getMetaData() method to retrieve the relevant column metadata and store in an array list of "columnDetail" objects.
Column Detail defined as
private static class columnDetail {
public String tableName;
public String columnName;
public String dataTypeName;
public int dataTypeId;
public String size;
public String scale;
}
I then iterate through this array list and build up the Avro schema using the org.apache.avro.SchemaBuilder class.
My issue is around decimal logical types.
I iterate throuth the array list twice. The first time to add all fields to the FieldAssembler, the second to modify certain byte fields to add the decimal logical datatype.
The issue I am experiencing is that I get an error if the Decimal scale value changes between iterations.
As it iterates through the columnDetail array, it will work so long as the value "scale" does not change. If it does change, the following occurs:
Exception in thread "main" org.apache.avro.AvroRuntimeException: Can't overwrite property: scale
at org.apache.avro.JsonProperties.addProp(JsonProperties.java:187)
at org.apache.avro.Schema.addProp(Schema.java:134)
at org.apache.avro.JsonProperties.addProp(JsonProperties.java:191)
at org.apache.avro.Schema.addProp(Schema.java:139)
at org.apache.avro.LogicalTypes$Decimal.addToSchema(LogicalTypes.java:193)
at GenAvroSchema.main(GenAvroSchema.java:85)
I can prevent this by hardcoding the decimal size. i.e. I can replace
org.apache.avro.LogicalTypes.decimal(Integer.parseInt(cd.size),Integer.parseInt(cd.scale)).addToSchema(schema.getField(cd.columnName).schema());
with
org.apache.avro.LogicalTypes.decimal(18,2).addToSchema(schema.getField(cd.columnName).schema());
This however ends up with the same size datatype for all decimal fields which is not desirable.
Can someone help with this ?
Java: 1.8.0_202 Avro: avro-1.8.2.jar
My java code:
public static void main(String[] args) throws Exception{
String jdbcURL = "jdbc:sforce://login.salesforce.com";
String jdbcUser = "userid";
String jdbcPassword = "password";
String avroDataType = "";
HashMap<String, String> dtmap = new HashMap<String, String>();
dtmap.put("VARCHAR", "string");
dtmap.put("BOOLEAN", "boolean");
dtmap.put("NUMERIC", "bytes");
dtmap.put("INTEGER", "int");
dtmap.put("TIMESTAMP", "string");
dtmap.put("DATE", "string");
ArrayList<columnDetail> columnDetails = new ArrayList<columnDetail>();
columnDetails = populateMetadata(jdbcURL, jdbcUser, jdbcPassword); // This works so have not included code here
SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.builder().record("account").doc("Account Detials").fields() ;
for(columnDetail cd:columnDetails) {
avroDataType = dtmap.get(JDBCType.valueOf(cd.dataTypeId).getName());
switch(avroDataType)
{
case "string":
fields.name(cd.columnName).type().unionOf().nullType().and().stringType().endUnion().nullDefault();
break;
case "int":
fields.name(cd.columnName).type().unionOf().nullType().and().intType().endUnion().nullDefault();
break;
case "boolean":
fields.name(cd.columnName).type().unionOf().booleanType().and().nullType().endUnion().booleanDefault(false);
break;
case "bytes":
if(Integer.parseInt(cd.scale) == 0) {
fields.name(cd.columnName).type().unionOf().nullType().and().longType().endUnion().nullDefault();
} else {
fields.name(cd.columnName).type().bytesType().noDefault();
}
break;
default:
fields.name(cd.columnName).type().unionOf().nullType().and().stringType().endUnion().nullDefault();
break;
}
}
Schema schema = fields.endRecord();
for(columnDetail cd:columnDetails) {
avroDataType = dtmap.get(JDBCType.valueOf(cd.dataTypeId).getName());
if(avroDataType == "bytes" && Integer.parseInt(cd.scale) != 0) {
//org.apache.avro.LogicalTypes.decimal(Integer.parseInt(cd.size),Integer.parseInt(cd.scale)).addToSchema(schema.getField(cd.columnName).schema());
org.apache.avro.LogicalTypes.decimal(18,2).addToSchema(schema.getField(cd.columnName).schema());
}
}
BufferedWriter writer = new BufferedWriter(new FileWriter("./account.avsc"));
writer.write(schema.toString());
writer.close();
}
Thanks,
Eoin.