I'm using Spark 2.1.1 and Scala 2.11.8
This question is an extension of one my earlier questions:
How to identify null fields in a csv file?
The change is that rather than reading the data from a CSV file, I'm now reading the data from an avro file. This is the format of the avro file I'm reading the data from :
var ttime: Long = 0;
var eTime: Long = 0;
var tids: String = "";
var tlevel: Integer = 0;
var tboot: Long = 0;
var rNo: Integer = 0;
var varType: String = "";
var uids: List[TRUEntry] = Nil;
I'm parsing the avro file in a separate class.
I have to map the tids column with every single one of the uids in the same way as mentioned in the accepted answer of the link posted above, except this time from an avro file rather than a well formatted csv file. How can I do this?
This is the code I'm trying to do it with :
val avroRow = spark.read.avro(inputString).rdd
val avroParsed = avroRow
.map(x => new TRParser(x))
.map((obj: TRParser) => ((obj.tids, obj.uId ),1))
.reduceByKey(_+_)
.saveAsTextFile(outputString)
After obj.tids, all the uids columns have to be mapped individually to give a final output same as mentioned in the accepted answer of the above link.
This is how I'm parsing all the uids in the avro file parsing class:
this.uids = Nil
row.getAs[Seq[Row]]("uids")
.foreach((objRow: Row) =>
this.uids ::= (new TRUEntry(objRow))
)
this.uids
.foreach((obj:TRUEntry) => {
uInfo += obj.uId + " , " + obj.initM.toString() + " , "
})
P.S : I apologise if the question seems dumb but this is my first encounter with avro file