1

I am attempting to generate an Avro schema from java to describe a table that I can access via JDBC.

I use the JDBC getMetaData() method to retrieve the relevant column metadata and store in an array list of "columnDetail" objects.

Column Detail defined as

private static class columnDetail {
    public String tableName;
    public String columnName;
    public String dataTypeName;
    public int dataTypeId;
    public String size;
    public String scale;
}

I then iterate through this array list and build up the Avro schema using the org.apache.avro.SchemaBuilder class.

My issue is around decimal logical types.

I iterate throuth the array list twice. The first time to add all fields to the FieldAssembler, the second to modify certain byte fields to add the decimal logical datatype.

The issue I am experiencing is that I get an error if the Decimal scale value changes between iterations.

As it iterates through the columnDetail array, it will work so long as the value "scale" does not change. If it does change, the following occurs:

Exception in thread "main" org.apache.avro.AvroRuntimeException: Can't overwrite property: scale
    at org.apache.avro.JsonProperties.addProp(JsonProperties.java:187)
    at org.apache.avro.Schema.addProp(Schema.java:134)
    at org.apache.avro.JsonProperties.addProp(JsonProperties.java:191)
    at org.apache.avro.Schema.addProp(Schema.java:139)
    at org.apache.avro.LogicalTypes$Decimal.addToSchema(LogicalTypes.java:193)
    at GenAvroSchema.main(GenAvroSchema.java:85)

I can prevent this by hardcoding the decimal size. i.e. I can replace

org.apache.avro.LogicalTypes.decimal(Integer.parseInt(cd.size),Integer.parseInt(cd.scale)).addToSchema(schema.getField(cd.columnName).schema());

with

org.apache.avro.LogicalTypes.decimal(18,2).addToSchema(schema.getField(cd.columnName).schema());

This however ends up with the same size datatype for all decimal fields which is not desirable.

Can someone help with this ?

Java: 1.8.0_202 Avro: avro-1.8.2.jar

My java code:

public static void main(String[] args) throws Exception{
    String jdbcURL = "jdbc:sforce://login.salesforce.com";
    String jdbcUser = "userid";
    String jdbcPassword = "password";
    String avroDataType = "";

    HashMap<String, String> dtmap = new HashMap<String, String>();
    dtmap.put("VARCHAR", "string");
    dtmap.put("BOOLEAN", "boolean");
    dtmap.put("NUMERIC", "bytes");
    dtmap.put("INTEGER", "int");
    dtmap.put("TIMESTAMP", "string");
    dtmap.put("DATE", "string");

    ArrayList<columnDetail> columnDetails = new ArrayList<columnDetail>();

    columnDetails = populateMetadata(jdbcURL, jdbcUser, jdbcPassword); // This works so have not included code here

    SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.builder().record("account").doc("Account Detials").fields() ;

    for(columnDetail cd:columnDetails) {
        avroDataType = dtmap.get(JDBCType.valueOf(cd.dataTypeId).getName());

        switch(avroDataType)
            {
                case "string":
                    fields.name(cd.columnName).type().unionOf().nullType().and().stringType().endUnion().nullDefault();
                    break;
                case "int":
                    fields.name(cd.columnName).type().unionOf().nullType().and().intType().endUnion().nullDefault();
                    break;
                case "boolean":
                    fields.name(cd.columnName).type().unionOf().booleanType().and().nullType().endUnion().booleanDefault(false);
                    break;
                case "bytes":
                    if(Integer.parseInt(cd.scale) == 0) {
                        fields.name(cd.columnName).type().unionOf().nullType().and().longType().endUnion().nullDefault();
                    } else {
                        fields.name(cd.columnName).type().bytesType().noDefault();
                    }
                    break;
                default:
                    fields.name(cd.columnName).type().unionOf().nullType().and().stringType().endUnion().nullDefault();
                    break;
            }
    }

    Schema schema = fields.endRecord();

    for(columnDetail cd:columnDetails) {
        avroDataType = dtmap.get(JDBCType.valueOf(cd.dataTypeId).getName());

        if(avroDataType == "bytes" && Integer.parseInt(cd.scale) != 0) {
            //org.apache.avro.LogicalTypes.decimal(Integer.parseInt(cd.size),Integer.parseInt(cd.scale)).addToSchema(schema.getField(cd.columnName).schema());
            org.apache.avro.LogicalTypes.decimal(18,2).addToSchema(schema.getField(cd.columnName).schema());
        }

    }

    BufferedWriter writer = new BufferedWriter(new FileWriter("./account.avsc"));
    writer.write(schema.toString());
    writer.close();
}

Thanks,

Eoin.

Adam Millerchip
  • 20,844
  • 5
  • 51
  • 74

0 Answers0