I want to split the following RDD into a single RDD(id,(all name same type))
.
>val test = rddByKey.map{case(k,v)=> (k,v.collect())}
test: Array[(String, Array[String])] =
Array(
(45000,Array(Amit, Pavan, Ratan)),
(10000,Array(Kumar, Venkat, Sheela)),
(50000,Array(Tejas, Dinesh, Lokesh, Bhupesh))
)
I want to print it like this:
(45000,(Amit, Pavan, Ratan))
(10000,(Kumar, Venkat, Sheela))
This is what I have tried
val data = sc.textFile("/user/cloudera/data.csv")
val rdd = data.map(r=>(r.split(",")(0),r.split(",")(1)))
val groupByKey = rdd.groupByKey().collect()
val rddByKey = groupByKey.map{case(k,v) => k->sc.makeRDD(v.toSeq)}
val test = rddByKey.map{case(k,v)=> (k,v.collect())}