How to map one column with other columns in an avro file?

Question

I'm using Spark 2.1.1 and Scala 2.11.8

This question is an extension of one my earlier questions:

How to identify null fields in a csv file?

The change is that rather than reading the data from a CSV file, I'm now reading the data from an avro file. This is the format of the avro file I'm reading the data from :

var ttime: Long = 0;
var eTime: Long = 0;
var tids: String = "";
var tlevel: Integer = 0;
var tboot: Long = 0;
var rNo: Integer = 0;
var varType: String = "";
var uids: List[TRUEntry] = Nil;

I'm parsing the avro file in a separate class.

I have to map the tids column with every single one of the uids in the same way as mentioned in the accepted answer of the link posted above, except this time from an avro file rather than a well formatted csv file. How can I do this?

This is the code I'm trying to do it with :

val avroRow = spark.read.avro(inputString).rdd
  val avroParsed = avroRow
    .map(x => new TRParser(x))
    .map((obj: TRParser) => ((obj.tids, obj.uId ),1))
    .reduceByKey(_+_)
    .saveAsTextFile(outputString)

After obj.tids, all the uids columns have to be mapped individually to give a final output same as mentioned in the accepted answer of the above link.

This is how I'm parsing all the uids in the avro file parsing class:

this.uids = Nil
    row.getAs[Seq[Row]]("uids")
    .foreach((objRow: Row) => 
      this.uids ::= (new TRUEntry(objRow))
    )

this.uids    
.foreach((obj:TRUEntry) => {
  uInfo += obj.uId + " , " + obj.initM.toString() + " , "
})

P.S : I apologise if the question seems dumb but this is my first encounter with avro file

score 0 · Answer 1 · answered Jul 10 '17 at 06:24

It can be done by passing the same for loop processing

this.uids

in the main code as :

 val avroParsed = avroRow
    .map(x => new TRParser(x))
    .map((obj: TRParser) => {
      val tId = obj.source.trim
      var retVal: String = ""
      obj.uids
        .foreach((obj: TRUEntry) => {
          retVal += tId + "," + obj.uId.trim + ":"
        })
        retVal.dropRight(1)
    })

 val flattened = avroParsed
 .flatMap(x => x.split(":"))
 .map(y => ((y),1))

How to map one column with other columns in an avro file?

1 Answers1