I have an avro file which has records, then in their fields (which have uniontypes) there are other records, which also have fields with union types, and some types have a certain property connect.name
which i need to check if it equals to io.debezium.time.NanoTimestamp
. I`m doing this in Apache NiFi using an ExecuteScript processor with Groovy script.
A shortened example of the Avro schema:
{
"type": "record",
"name": "Envelope",
"namespace": "data.none.bpm.pruitsmdb_nautilus_dbo.fast_frequency_tables.avro.test",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "Value",
"fields": [
{
"name": "Id",
"type": {
"type": "string",
"connect.parameters": {
"__debezium.source.column.type": "UNIQUEIDENTIFIER",
"__debezium.source.column.length": "36"
}
}
},
{
"name": "CreatedOn",
"type": [
"null",
{
"type": "long",
"connect.version": 1,
"connect.parameters": {
"__debezium.source.column.type": "DATETIME2",
"__debezium.source.column.length": "27",
"__debezium.source.column.scale": "7"
},
"connect.name": "io.debezium.time.NanoTimestamp"
}
],
"default": null
},
{
"name": "CreatedById",
"type": [
"null",
{
"type": "string",
"connect.parameters": {
"__debezium.source.column.type": "UNIQUEIDENTIFIER",
"__debezium.source.column.length": "36"
}
}
],
"default": null
}
],
"connect.name": "data.none.bpm.pruitsmdb_nautilus_dbo.fast_frequency_tables.avro.test.Value"
}
],
"default": null
},
{
"name": "after",
"type": [
"null",
"Value"
],
"default": null
},
{
"name": "source",
"type": {
"type": "record",
"name": "Source",
"namespace": "io.debezium.connector.sqlserver",
"fields": [
{
"name": "version",
"type": "string"
},
{
"name": "ts_ms",
"type": "long"
},
{
"name": "snapshot",
"type": [
{
"type": "string",
"connect.version": 1,
"connect.parameters": {
"allowed": "true,last,false"
},
"connect.default": "false",
"connect.name": "io.debezium.data.Enum"
},
"null"
],
"default": "false"
}
],
"connect.name": "io.debezium.connector.sqlserver.Source"
}
},
{
"name": "op",
"type": "string"
},
{
"name": "ts_ms",
"type": [
"null",
"long"
],
"default": null
}
],
"connect.name": "data.none.bpm.pruitsmdb_nautilus_dbo.fast_frequency_tables.avro.test.Envelope"
}
My Groovy code, which obviously seems to be checking the top-level records only, and also I'm not sure whether I'm checking the property connect.name
correctly:
reader.forEach{ GenericRecord record ->
record.getSchema().getFields().forEach{ Schema.Field field ->
try {
field.schema().getTypes().forEach{ Schema typeSchema ->
if(typeSchema.getProp("connect.name") == "io.debezium.time.NanoTimestamp"){
record.put(field.name(), Long(record.get(field.name()).toString().substring(0, 13)))
typeSchema.addProp("logicalType", "timestamp-millis")
}
}
} catch(Exception ex){
println("Catching the exception")
}
}
writer.append(record)
}
My question is - how to traverse all nested Records (there are top-level records' fields which have "record" type and records inside) in the avro file? And when traversing their Fields - how to check correctly that one of their types (which may go in union) has a property connect.name == io.debezium.time.NanoTimestamp
and if yes, perform a transformation on the field value and add a logicalType
property to the field`s type?