I have a RDD which would be like ((String, String), TimeStamp)
. I have large number of records and I want to select for each key the record with latest TimeStamp value. I have tried the following code and still struggling to to this. Can anybody help me to do this ?
The below code I tried is wrong and not working as well
val context = sparkSession.read.format("jdbc")
.option("driver", "com.mysql.jdbc.Driver")
.option("url", url)
.option("dbtable", "student_risk")
.option("user", "user")
.option("password", "password")
.load()
context.cache();
val studentRDD = context.rdd.map(r => ((r.getString(r.fieldIndex("course_id")), r.getString(r.fieldIndex("student_id"))), r.getTimestamp(r.fieldIndex("risk_date_time"))))
val filteredRDD = studentRDD.collect().map(z => (z._1, z._2)).reduce((x, y) => (x._2.compareTo(y._2)))