I'm looking for how to read avro messages which has complex structure from Kafka using Spark structure streaming
I then want to parse these message and compare with hbase reference values, and then save outcome into hdfs or another hbase table.
I started with below sample code : https://github.com/Neuw84/spark-continuous-streaming/blob/master/src/main/java/es/aconde/structured/StructuredDemo.java
Avro message schema:
struct[mTimeSeries:
struct[cName:string,
eIpAddr:string,
pIpAddr:string,
pTime:string,
mtrcs:array[struct[mName:string,
xValues:array[bigint],
yValues:array[string],
rName:string]]]]
I am struggling to create a row using RowFactory.create for this schema. So do i need to iterate through array fields? I understand that we can use explode functions on dataset to denormalize or access inner fields of struct array once we create dataset with this structure as I do it in Hive. So I would like to create a row as is i.e.exactly how a avro message looks like and then use sql functions to further transform.
sparkSession.udf().register("deserialize", (byte[] data) -> {
GenericRecord record = recordInjection.invert(data).get();
return ***RowFactory.create(record.get("machine").toString(), record.get("sensor").toString(), record.get("data"), record.get("eventTime"));***
}, DataTypes.createStructType(type.fields())